Unhandled Exception in AgentX++ : another detail for that Bug

Frank Fock Frank.Fock____t-online.de
Mon Mar 25 10:20:27 CET 2002


Hello Christel,

I think the assumption that the instance of Pdu has been deleted by another
thread cannot be true, because it is created and deleted within the receive
method and not used outside. The data used outside is always a copy of
the pdu.

Before trying anything else, I would remove the code you quoted from the
main loop. It calls the finished() method of a Request object that might be
already deleted. This may cause a jump to an arbitrary memory location
which could cause any type of fault.

Instead using this construct, you may better use a PDU type driven
dispatching to two ThreadPools, where the thread pool for SET request
would have the size 1. You can use the AgentX++ dispatching as an
example.

Hope this helps.

Best regards,
Frank


christel.sohnemann____philips.com wrote:

> Hello again,
> after inserting some "printf's" in the AgentX++ code, I found out the following:
>
> The unhandled exception occurs in the marked (<******************************) line:
>
> void AgentXRequestList::answer(Request* req) TS_SYNCHRONIZED(
> {
>       // CAUTION: Make a copy of PDU here because when we answer
>       // the request, the response could be so fast back to our
>       // master, that the following request could be processed
>       // before we have finished here. In the case of a COMMIT
>       // or CLEANUP following a PREPARE_SET this could cause a seg
>       // fault because the request were are dealy with here is deleted
>       // by the main thread by calling AgentXRequestList::receive!
>       //
>       AgentXPdu* pdu = new AgentXPdu(*((AgentXPdu*)req->get_pdu()));
>
>       boolean remove = TRUE;
>       // check if we need request for further processing
>       switch (pdu->get_agentx_type()) {
>       case AGENTX_TESTSET_PDU:
>       case AGENTX_COMMITSET_PDU:
>         remove = FALSE;
>         break;
>       }
>       if (remove)
>             requests->remove(req);
>       pdu->set_agentx_type(AGENTX_RESPONSE_PDU);
>       int status = agentx->send(*pdu);
>
>       if (!remove) {
> printf("AgentXRequestList::answer : remove ist FALSE");
>         // If we do not get a CLEANUP from the master
>         // we have to remove the pending request by ourselves.
>         ((AgentXRequest*)req)->get_agentx_pdu()->                       <****************************** requests PDU is not valid
>           set_timestamp(agentx->compute_timeout(AGENTX_DEFAULT_TIMEOUT));
>  printf("AgentXRequestList::answer : pdu timestamp wurde gesetzt");
>         ((AgentXRequest*)req)->unlock();
>       }
>       LOG_BEGIN(EVENT_LOG | 4);
>       LOG("RequestListAgentX: request answered (id)(status)(tid)(err)(removed)(sz)");
>       LOG(pdu->get_request_id());
>       LOG(status);
>       LOG(pdu->get_transaction_id());
>       LOG(pdu->get_error_status());
>       LOG(remove);
>       LOG(pdu->get_vb_count());
>       LOG_END;
>
>       delete pdu;
>       if (remove)
>             delete req;
> })
>
> however, just before the exception occurs, the following line was proceeded by another thread (marked by <******************************):
>
> Request* AgentXRequestList::receive(int sec)
> {
> printf("==================> AgentXRequestList::receive \n");
>       int status = AGENTX_OK;
>       AgentXPdu* pdu = agentx->receive(sec, status);
>       if (!pdu)
>       {
> printf("<==================AgentXRequestList::receive : the PDU we received is NULL\n");
>             return 0;
>       }
>
>       LOG_BEGIN(EVENT_LOG | 2);
>       LOG("AgentXRequestList: request received (context)(tid)(pid)(siz)(type)(err)(status)");
>         LOG(pdu->get_context().get_printable());
>       LOG(pdu->get_transaction_id());
>       LOG(pdu->get_packet_id());
>       LOG(pdu->get_vb_count());
>       LOG(pdu->get_agentx_type());
>       LOG(pdu->get_error_status());
>       LOG(status);
>       LOG_END;
>
>       if (status == AGENTX_OK)
>       {
>             Array<MibEntry> locks;
>             switch (pdu->get_agentx_type()) {
>             case AGENTX_GET_PDU:
>             case AGENTX_GETNEXT_PDU:
>             case AGENTX_GETBULK_PDU:
>               // for each search range create an vb
>               // with the lower bound as oid and null as value
>               pdu->build_vbs_from_ranges();
>               break;
>             case AGENTX_COMMITSET_PDU:
>             case AGENTX_CLEANUPSET_PDU:
>             case AGENTX_UNDOSET_PDU:
> printf("AgentXRequestList::receive :: one of AGENTX_COMMITSET_PDU, AGENTX_CLEANUPSET_PDU, AGENTX_UNDOSET_PDU\n");
>               AgentXRequest* r =
>                 (AgentXRequest*)
>                 find_request_on_id(pdu->get_transaction_id());
>               if (!r) {
>                   // pdu does not follow a testset pdu -> ignore
>                   LOG_BEGIN(ERROR_LOG | 1);
>                   LOG("AgentXRequestList: commit, cleanup, or undo request does not follow a test set request (pid)(tid)(type)");
>                   LOG(pdu->get_packet_id());
>                   LOG(pdu->get_transaction_id());
>                   LOG(pdu->get_agentx_type());
>                   LOG_END;
> printf("AgentXRequestList::receive :: delete pdu\n");
>                   delete pdu;
> printf("<==================AgentXRequestList::receive :: after delete pdu\n");
>                   return 0;
>               }
>               // Acquire lock for the existing request, because
>               // it may be still in the queue, when the master
>               // timed out that request. We are blocking here
>               // but this case should be rare.
>               r->lock();
>               pdu->set_vblist(r->originalVbs, r->originalSize);
>               // copy locks
>               for (int i=0; i<r->originalSize; i++) {
>                   locks.add(r->get_locked(i));
>               }
>               r->locks.clear();
>               // is done by destructor: r->unlock();
>               delete requests->remove(r);
>             }
> printf("AgentXRequestList::receive :: create a new request with our pdu\n");
>             AgentXRequest* req = new AgentXRequest(*pdu);
>             // paste locks
>             if (locks.size()>0) {
>                   for (int i=0; i<locks.size(); i++) {
>                         req->locks.add(locks.getNth(i));
>                   }
>             }
>             locks.clear();
> printf("AgentXRequestList::receive :: delete pdu now(mark 1)\n");
>             delete pdu; <****************************** ZACK
> printf("<==================AgentXRequestList::receive :: after delete pdu (mark 1)\n");
>             // if request is to be ignored req will be
>             // deleted by add_request
>             return add_request(req);
>
>       } // end "if (status == AGENTX_OK)" begin "else"
>       else {
>             switch (status) {
>             default:
>               break;
>             }
> printf("AgentXRequestList::receive :: delete pdu (mark 2)\n");
>             delete pdu;
> printf("AgentXRequestList::receive :: after delete pdu (mark 2)\n");
>       }
>       return 0;
> printf("<==================AgentXRequestList::receive\n");
> }
>
> So, might it be possible, that the pdu accessed in the first (the answer) method be the one, that was just deleted before in the receive-method? As far as I see, this PDU was assigned to a request object, is that the same ?
>
> Is this the bug? Or may it be possible, that I use the framework in a wrong way? (most of the code dealing with master and subagent creation and receiving requests, I copied from the example files). Although, I added the following code in the
> master-agents receive-Request loop:
>
>                   if ( sNMP_PDU_SET == pRequest->get_type())
>                   {
>                         while ( ! pRequest->finished() )
>                         {
>                               // yeah, we do nothing here: we have to wait, until request
>                               // is completely proceeded.
>                               // we can be sure, that request will finish some time.
>                               // This is either, when all Variable Bindings have been
>                               // proceeded or when an error occured.
>                         };
>                   }
>
> I added this code, because otherwise some problems occured in the past (hard to remember what).
>
> To your questions:
> 1.) it happens on one variable very very quick and often. But when performing some stress tests with our software, it happens also with other variables but not that quickly.
> 2.) one of the subagents is loaded by the masteragent, so it is hard so say. According to code-dump, yes, I am sure, you are right, it is the subagent.
> 3.) yes, it is
> 4.) no answer required, because 1 is true
> 5.) no answer requrired, because 3 is true
>
> unfortunately, I am not in my office on monday. So, could you please also send the answer to my colleque ansgar.springub at philips.com ??
> thank you very much for your help. Best regards, christel
>
> Christel Sohnemann
> Software Development
> Philips Speech Processing Aachen, Zweigniederlassung der Philips GmbH
> Kackertstr. 10, 52072 Aachen, Germany
> mailTo: christel.sohnemann at philips.com
> Tel:    +49 - (0)241 - 8871 191,    Fax: +49 - (0)241 - 8871 140
> http://www.speech.philips.com/





More information about the AGENTPP mailing list