Unhandled Exception in AgentX++ : another detail for that Bug

christel.sohnemann____philips.com christel.sohnemann____philips.com
Tue Mar 26 13:24:49 CET 2002


The problem occurs in the AgentXRequestList::answer Method.

First, the PDU is copied, which is fine.
Later on, in the switch statement, the bool variable "remove" is set to false. That's why the second if is proceeded, and not the first. OK, in the second if, there ist the code "((AgentXRequest*)req)->get_agentx_pdu()->...:" ... and that is exactly where
it crashes. It does so, because the PDU of the request is invalid (pointer to 0xaeaeaeae something).
My guess is that this pdu object (the one IN the request) is deleted in AgentXRequestList::receive ???? I guess so, because my debug-printfs seem to show me that this lines of code are executed right before the crash ....

Any idea?
best regards,
Christel





Christel Sohnemann
Software Development
Philips Speech Processing Aachen, Zweigniederlassung der Philips GmbH
Kackertstr. 10, 52072 Aachen, Germany
mailTo: christel.sohnemann____philips.com
Tel:    +49 - (0)241 - 8871 191,    Fax: +49 - (0)241 - 8871 140
http://www.speech.philips.com/


                                                                                                                                 
                      Frank.Fock at t-onl                                                                                           
                      ine.de                       To:  Christel Sohnemann/ACN/BE/PHILIPS at EMEA1                                  
                                                   cc:  agentpp-dl at agentpp.com                                                   
                      26.03.2002 11:09             Subject:   Re: Unhandled Exception in AgentX++ : another detail for that Bug  
                                                                                                                                 
                                                   Classification:                                                               
                                                                                                                                 
                                                                                                                                 




OK, when it also crashes without the extra code in the
mainloop, then you can ignore what I wrote about using
thread pools to achieve that only a single SET is
processed at once.

But where in the code do you see that a pointer to the
AgentXPdu allocated in the receive method ever leaves
that method? So, if the subagent crashes when that pdu
is deleted, then IMHO there must be memory fault
somewhere else, which should not be hard to find with
Purify.

Sorry, I have no better solution ready to hand.

Best regards,
Frank

christel.sohnemann at philips.com schrieb:
> Hi,
> when I remove the code as you said (but I do nothing else
> because I do not understand what you mean) I still get
> the unhandled exception and it still occurs in the
> "answer" method of the requestlist. So I still guess my
> assuption is right.
> Christel
>
>
>
>
> Christel Sohnemann
> Software Development
> Philips Speech Processing Aachen, Zweigniederlassung der
> Philips GmbH
> Kackertstr. 10, 52072 Aachen, Germany
> mailTo: christel.sohnemann at philips.com
> Tel:    +49 - (0)241 - 8871 191,    Fax: +49 - (0)241 -
> 8871 140
> http://www.speech.philips.com/
>
>
>
>
>
>                       Frank.Fock at t-onl
>
>
>                       ine.de (Frank                To:
> Christel Sohnemann/ACN/BE/PHILIPS at EMEA1
>
>                       Fock)                        cc:
> agentpp-dl at agentpp.com
>
>
> Ansgar Springub/ACN/BE/PHILIPS at EMEA1
>
>                       25.03.2002 10:20
> Subject:   Re: Unhandled Exception in AgentX++ : another
> detail for that Bug
>
>
>
>
> Classification:
>
>
>
>
>
>
>
>
>
>
>
> Hello Christel,
>
> I think the assumption that the instance of Pdu has been
> deleted by another
> thread cannot be true, because it is created and deleted
> within the receive
> method and not used outside. The data used outside is
> always a copy of
> the pdu.
>
> Before trying anything else, I would remove the code you
> quoted from the
> main loop. It calls the finished() method of a Request
> object that might be
> already deleted. This may cause a jump to an arbitrary
> memory location
> which could cause any type of fault.
>
> Instead using this construct, you may better use a PDU
> type driven
> dispatching to two ThreadPools, where the thread pool for
> SET request
> would have the size 1. You can use the AgentX++
> dispatching as an
> example.
>
> Hope this helps.
>
> Best regards,
> Frank
>
>
> christel.sohnemann____philips.com wrote:
>
> > Hello again,
> > after inserting some "printf's" in the AgentX++ code, I
> found out the following:
> >
> > The unhandled exception occurs in the marked
> (<******************************) line:
> >
> > void AgentXRequestList::answer(Request* req)
> TS_SYNCHRONIZED(
> > {
> >       // CAUTION: Make a copy of PDU here because when
> we answer
> >       // the request, the response could be so fast
> back to our
> >       // master, that the following request could be
> processed
> >       // before we have finished here. In the case of a
> COMMIT
> >       // or CLEANUP following a PREPARE_SET this could
> cause a seg
> >       // fault because the request were are dealy with
> here is deleted
> >       // by the main thread by calling
> AgentXRequestList::receive!
> >       //
> >       AgentXPdu* pdu = new
> AgentXPdu(*((AgentXPdu*)req->get_pdu()));
> >
> >       boolean remove = TRUE;
> >       // check if we need request for further
> processing
> >       switch (pdu->get_agentx_type()) {
> >       case AGENTX_TESTSET_PDU:
> >       case AGENTX_COMMITSET_PDU:
> >         remove = FALSE;
> >         break;
> >       }
> >       if (remove)
> >             requests->remove(req);
> >       pdu->set_agentx_type(AGENTX_RESPONSE_PDU);
> >       int status = agentx->send(*pdu);
> >
> >       if (!remove) {
> > printf("AgentXRequestList::answer : remove ist FALSE");
> >         // If we do not get a CLEANUP from the master
> >         // we have to remove the pending request by
> ourselves.
> >         ((AgentXRequest*)req)->get_agentx_pdu()->
>                 <****************************** requests
> PDU is not valid
> set_timestamp(agentx->compute_timeout(AGENTX_DEFAULT_TIM
> >           EOUT));
> >  printf("AgentXRequestList::answer : pdu timestamp
> wurde gesetzt");
> >         ((AgentXRequest*)req)->unlock();
> >       }
> >       LOG_BEGIN(EVENT_LOG | 4);
> >       LOG("RequestListAgentX: request answered
> (id)(status)(tid)(err)(removed)(sz)");
> >       LOG(pdu->get_request_id());
> >       LOG(status);
> >       LOG(pdu->get_transaction_id());
> >       LOG(pdu->get_error_status());
> >       LOG(remove);
> >       LOG(pdu->get_vb_count());
> >       LOG_END;
> >
> >       delete pdu;
> >       if (remove)
> >             delete req;
> > })
> >
> > however, just before the exception occurs, the
> following line was proceeded by another thread (marked by
> <******************************):
> >
> > Request* AgentXRequestList::receive(int sec)
> > {
> > printf("==================> AgentXRequestList::receive
> \n");
> >       int status = AGENTX_OK;
> >       AgentXPdu* pdu = agentx->receive(sec, status);
> >       if (!pdu)
> >       {
> > printf("<==================AgentXRequestList::receive :
> the PDU we received is NULL\n");
> >             return 0;
> >       }
> >
> >       LOG_BEGIN(EVENT_LOG | 2);
> >       LOG("AgentXRequestList: request received
> (context)(tid)(pid)(siz)(type)(err)(status)");
> >         LOG(pdu->get_context().get_printable());
> >       LOG(pdu->get_transaction_id());
> >       LOG(pdu->get_packet_id());
> >       LOG(pdu->get_vb_count());
> >       LOG(pdu->get_agentx_type());
> >       LOG(pdu->get_error_status());
> >       LOG(status);
> >       LOG_END;
> >
> >       if (status == AGENTX_OK)
> >       {
> >             Array<MibEntry> locks;
> >             switch (pdu->get_agentx_type()) {
> >             case AGENTX_GET_PDU:
> >             case AGENTX_GETNEXT_PDU:
> >             case AGENTX_GETBULK_PDU:
> >               // for each search range create an vb
> >               // with the lower bound as oid and null
> as value
> >               pdu->build_vbs_from_ranges();
> >               break;
> >             case AGENTX_COMMITSET_PDU:
> >             case AGENTX_CLEANUPSET_PDU:
> >             case AGENTX_UNDOSET_PDU:
> > printf("AgentXRequestList::receive :: one of
> AGENTX_COMMITSET_PDU, AGENTX_CLEANUPSET_PDU,
> AGENTX_UNDOSET_PDU\n");
> >               AgentXRequest* r =
> >                 (AgentXRequest*)
> >
> find_request_on_id(pdu->get_transaction_id());
> >               if (!r) {
> >                   // pdu does not follow a testset pdu
> -> ignore
> >                   LOG_BEGIN(ERROR_LOG | 1);
> >                   LOG("AgentXRequestList: commit,
> cleanup, or undo request does not follow a test set
> request (pid)(tid)(type)");
> >                   LOG(pdu->get_packet_id());
> >                   LOG(pdu->get_transaction_id());
> >                   LOG(pdu->get_agentx_type());
> >                   LOG_END;
> > printf("AgentXRequestList::receive :: delete pdu\n");
> >                   delete pdu;
> > printf("<==================AgentXRequestList::receive
> :: after delete pdu\n");
> >                   return 0;
> >               }
> >               // Acquire lock for the existing request,
> because
> >               // it may be still in the queue, when the
> master
> >               // timed out that request. We are
> blocking here
> >               // but this case should be rare.
> >               r->lock();
> >               pdu->set_vblist(r->originalVbs,
> r->originalSize);
> >               // copy locks
> >               for (int i=0; i<r->originalSize; i++) {
> >                   locks.add(r->get_locked(i));
> >               }
> >               r->locks.clear();
> >               // is done by destructor: r->unlock();
> >               delete requests->remove(r);
> >             }
> > printf("AgentXRequestList::receive :: create a new
> request with our pdu\n");
> >             AgentXRequest* req = new
> AgentXRequest(*pdu);
> >             // paste locks
> >             if (locks.size()>0) {
> >                   for (int i=0; i<locks.size(); i++) {
> >
> req->locks.add(locks.getNth(i));
> >                   }
> >             }
> >             locks.clear();
> > printf("AgentXRequestList::receive :: delete pdu
> now(mark 1)\n");
> >             delete pdu; <******************************
> ZACK
> > printf("<==================AgentXRequestList::receive
> :: after delete pdu (mark 1)\n");
> >             // if request is to be ignored req will be
> >             // deleted by add_request
> >             return add_request(req);
> >
> >       } // end "if (status == AGENTX_OK)" begin "else"
> >       else {
> >             switch (status) {
> >             default:
> >               break;
> >             }
> > printf("AgentXRequestList::receive :: delete pdu (mark
> 2)\n");
> >             delete pdu;
> > printf("AgentXRequestList::receive :: after delete pdu
> (mark 2)\n");
> >       }
> >       return 0;
> printf("<==================AgentXRequestList::receive\n"
> > );
> > }
> >
> > So, might it be possible, that the pdu accessed in the
> first (the answer) method be the one, that was just
> deleted before in the receive-method? As far as I see,
> this PDU was assigned to a request object, is that the
> same ?
> >
> > Is this the bug? Or may it be possible, that I use the
> framework in a wrong way? (most of the code dealing with
> master and subagent creation and receiving requests, I
> copied from the example files). Although, I added the
> following code in the
> > master-agents receive-Request loop:
> >
> >                   if ( sNMP_PDU_SET ==
> pRequest->get_type())
> >                   {
> >                         while ( ! pRequest->finished()
> )
> >                         {
> >                               // yeah, we do nothing
> here: we have to wait, until request
> >                               // is completely
> proceeded.
> >                               // we can be sure, that
> request will finish some time.
> >                               // This is either, when
> all Variable Bindings have been
> >                               // proceeded or when an
> error occured.
> >                         };
> >                   }
> >
> > I added this code, because otherwise some problems
> occured in the past (hard to remember what).
> >
> > To your questions:
> > 1.) it happens on one variable very very quick and
> often. But when performing some stress tests with our
> software, it happens also with other variables but not
> that quickly.
> > 2.) one of the subagents is loaded by the masteragent,
> so it is hard so say. According to code-dump, yes, I am
> sure, you are right, it is the subagent.
> > 3.) yes, it is
> > 4.) no answer required, because 1 is true
> > 5.) no answer requrired, because 3 is true
> >
> > unfortunately, I am not in my office on monday. So,
> could you please also send the answer to my colleque
> ansgar.springub at philips.com ??
> > thank you very much for your help. Best regards,
> christel
> >
> > Christel Sohnemann
> > Software Development
> > Philips Speech Processing Aachen, Zweigniederlassung
> der Philips GmbH
> > Kackertstr. 10, 52072 Aachen, Germany
> > mailTo: christel.sohnemann at philips.com
> > Tel:    +49 - (0)241 - 8871 191,    Fax: +49 - (0)241 -
> 8871 140
> > http://www.speech.philips.com/
>
>
>
>
>
>
>
>







More information about the AGENTPP mailing list