Unhandled Exception in AgentX++ : another detail for that Bug

Frank.Fock____t-online.de Frank.Fock____t-online.de
Tue Mar 26 14:06:32 CET 2002


christel.sohnemann at philips.com schrieb:
> The problem occurs in the AgentXRequestList::answer 
> Method. 
>
OK, this was the misunderstanding. From your code snippet
I thought that the crash occured in the receive method.

 
> First, the PDU is copied, which is fine.
> Later on, in the switch statement, the bool variable 
> "remove" is set to false. That's why the second if is 
> proceeded, and not the first. OK, in the second if, there 
> ist the code 
> "((AgentXRequest*)req)->get_agentx_pdu()->...:" ... and 
> that is exactly where 
> it crashes. It does so, because the PDU of the request is 
> invalid (pointer to 0xaeaeaeae something). 

This would point to a deleted or duplicate answered request.

> My guess is that this pdu object (the one IN the request) 
> is deleted in AgentXRequestList::receive ???? I guess so, 
> because my debug-printfs seem to show me that this lines 
> of code are executed right before the crash .... 
> 
That's what I tried to explain. In the receive method there
will be never an AgentXPdu object of a request deleted, 
because it only uses its own copy. Thus, the request/pdu
deletion must be occuring somewhere else.

Until now, I have not been successful to reproduce the 
problem, but I am still trying...

Best regards,
Frank

> 
> 
> OK, when it also crashes without the extra code in the
> mainloop, then you can ignore what I wrote about using
> thread pools to achieve that only a single SET is
> processed at once.
> 
> But where in the code do you see that a pointer to the
> AgentXPdu allocated in the receive method ever leaves
> that method? So, if the subagent crashes when that pdu
> is deleted, then IMHO there must be memory fault
> somewhere else, which should not be hard to find with
> Purify.
> 
> Sorry, I have no better solution ready to hand.
> 
> Best regards,
> Frank
> 
> christel.sohnemann at philips.com schrieb:
> > Hi,
> > when I remove the code as you said (but I do nothing 
> else 
> > because I do not understand what you mean) I still get
> > the unhandled exception and it still occurs in the
> > "answer" method of the requestlist. So I still guess my
> > assuption is right.
> > Christel
> >
> >
> >
> >
> > Christel Sohnemann
> > Software Development
> > Philips Speech Processing Aachen, Zweigniederlassung 
> der 
> > Philips GmbH
> > Kackertstr. 10, 52072 Aachen, Germany
> > mailTo: christel.sohnemann at philips.com
> > Tel:    +49 - (0)241 - 8871 191,    Fax: +49 - (0)241 -
> > 8871 140
> > http://www.speech.philips.com/
> >
> >
> >
> >
> >
> >                       Frank.Fock at t-onl
> >
> >
> >                       ine.de (Frank                To:
> > Christel Sohnemann/ACN/BE/PHILIPS at EMEA1
> >
> >                       Fock)                        cc:
> > agentpp-dl at agentpp.com
> >
> >
> > Ansgar Springub/ACN/BE/PHILIPS at EMEA1
> >
> >                       25.03.2002 10:20
> > Subject:   Re: Unhandled Exception in AgentX++ : 
> another 
> > detail for that Bug
> >
> >
> >
> >
> > Classification:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Hello Christel,
> >
> > I think the assumption that the instance of Pdu has 
> been 
> > deleted by another
> > thread cannot be true, because it is created and 
> deleted 
> > within the receive
> > method and not used outside. The data used outside is
> > always a copy of
> > the pdu.
> >
> > Before trying anything else, I would remove the code 
> you 
> > quoted from the
> > main loop. It calls the finished() method of a Request
> > object that might be
> > already deleted. This may cause a jump to an arbitrary
> > memory location
> > which could cause any type of fault.
> >
> > Instead using this construct, you may better use a PDU
> > type driven
> > dispatching to two ThreadPools, where the thread pool 
> for 
> > SET request
> > would have the size 1. You can use the AgentX++
> > dispatching as an
> > example.
> >
> > Hope this helps.
> >
> > Best regards,
> > Frank
> >
> >
> > christel.sohnemann____philips.com wrote:
> >
> > > Hello again,
> > > after inserting some "printf's" in the AgentX++ code, 
> I 
> > found out the following:
> > >
> > > The unhandled exception occurs in the marked
> > (<******************************) line:
> > >
> > > void AgentXRequestList::answer(Request* req)
> > TS_SYNCHRONIZED(
> > > {
> > >       // CAUTION: Make a copy of PDU here because 
> when 
> > we answer
> > >       // the request, the response could be so fast
> > back to our
> > >       // master, that the following request could be
> > processed
> > >       // before we have finished here. In the case of 
> a 
> > COMMIT
> > >       // or CLEANUP following a PREPARE_SET this 
> could 
> > cause a seg
> > >       // fault because the request were are dealy 
> with 
> > here is deleted
> > >       // by the main thread by calling
> > AgentXRequestList::receive!
> > >       //
> > >       AgentXPdu* pdu = new
> > AgentXPdu(*((AgentXPdu*)req->get_pdu()));
> > >
> > >       boolean remove = TRUE;
> > >       // check if we need request for further
> > processing
> > >       switch (pdu->get_agentx_type()) {
> > >       case AGENTX_TESTSET_PDU:
> > >       case AGENTX_COMMITSET_PDU:
> > >         remove = FALSE;
> > >         break;
> > >       }
> > >       if (remove)
> > >             requests->remove(req);
> > >       pdu->set_agentx_type(AGENTX_RESPONSE_PDU);
> > >       int status = agentx->send(*pdu);
> > >
> > >       if (!remove) {
> > > printf("AgentXRequestList::answer : remove ist 
> FALSE"); 
> > >         // If we do not get a CLEANUP from the master
> > >         // we have to remove the pending request by
> > ourselves.
> > >         ((AgentXRequest*)req)->get_agentx_pdu()->
> >                 <****************************** 
> requests 
> > PDU is not valid
> > 
> set_timestamp(agentx->compute_timeout(AGENTX_DEFAULT_TIM 
> > >           EOUT));
> > >  printf("AgentXRequestList::answer : pdu timestamp
> > wurde gesetzt");
> > >         ((AgentXRequest*)req)->unlock();
> > >       }
> > >       LOG_BEGIN(EVENT_LOG | 4);
> > >       LOG("RequestListAgentX: request answered
> > (id)(status)(tid)(err)(removed)(sz)");
> > >       LOG(pdu->get_request_id());
> > >       LOG(status);
> > >       LOG(pdu->get_transaction_id());
> > >       LOG(pdu->get_error_status());
> > >       LOG(remove);
> > >       LOG(pdu->get_vb_count());
> > >       LOG_END;
> > >
> > >       delete pdu;
> > >       if (remove)
> > >             delete req;
> > > })
> > >
> > > however, just before the exception occurs, the
> > following line was proceeded by another thread (marked 
> by 
> > <******************************):
> > >
> > > Request* AgentXRequestList::receive(int sec)
> > > {
> > > printf("==================> 
> AgentXRequestList::receive 
> > \n");
> > >       int status = AGENTX_OK;
> > >       AgentXPdu* pdu = agentx->receive(sec, status);
> > >       if (!pdu)
> > >       {
> > > printf("<==================AgentXRequestList::receive 
> : 
> > the PDU we received is NULL\n");
> > >             return 0;
> > >       }
> > >
> > >       LOG_BEGIN(EVENT_LOG | 2);
> > >       LOG("AgentXRequestList: request received
> > (context)(tid)(pid)(siz)(type)(err)(status)");
> > >         LOG(pdu->get_context().get_printable());
> > >       LOG(pdu->get_transaction_id());
> > >       LOG(pdu->get_packet_id());
> > >       LOG(pdu->get_vb_count());
> > >       LOG(pdu->get_agentx_type());
> > >       LOG(pdu->get_error_status());
> > >       LOG(status);
> > >       LOG_END;
> > >
> > >       if (status == AGENTX_OK)
> > >       {
> > >             Array<MibEntry> locks;
> > >             switch (pdu->get_agentx_type()) {
> > >             case AGENTX_GET_PDU:
> > >             case AGENTX_GETNEXT_PDU:
> > >             case AGENTX_GETBULK_PDU:
> > >               // for each search range create an vb
> > >               // with the lower bound as oid and null
> > as value
> > >               pdu->build_vbs_from_ranges();
> > >               break;
> > >             case AGENTX_COMMITSET_PDU:
> > >             case AGENTX_CLEANUPSET_PDU:
> > >             case AGENTX_UNDOSET_PDU:
> > > printf("AgentXRequestList::receive :: one of
> > AGENTX_COMMITSET_PDU, AGENTX_CLEANUPSET_PDU,
> > AGENTX_UNDOSET_PDU\n");
> > >               AgentXRequest* r =
> > >                 (AgentXRequest*)
> > >
> > find_request_on_id(pdu->get_transaction_id());
> > >               if (!r) {
> > >                   // pdu does not follow a testset 
> pdu 
> > -> ignore
> > >                   LOG_BEGIN(ERROR_LOG | 1);
> > >                   LOG("AgentXRequestList: commit,
> > cleanup, or undo request does not follow a test set
> > request (pid)(tid)(type)");
> > >                   LOG(pdu->get_packet_id());
> > >                   LOG(pdu->get_transaction_id());
> > >                   LOG(pdu->get_agentx_type());
> > >                   LOG_END;
> > > printf("AgentXRequestList::receive :: delete pdu\n");
> > >                   delete pdu;
> > > printf("<==================AgentXRequestList::receive
> > :: after delete pdu\n");
> > >                   return 0;
> > >               }
> > >               // Acquire lock for the existing 
> request, 
> > because
> > >               // it may be still in the queue, when 
> the 
> > master
> > >               // timed out that request. We are
> > blocking here
> > >               // but this case should be rare.
> > >               r->lock();
> > >               pdu->set_vblist(r->originalVbs,
> > r->originalSize);
> > >               // copy locks
> > >               for (int i=0; i<r->originalSize; i++) {
> > >                   locks.add(r->get_locked(i));
> > >               }
> > >               r->locks.clear();
> > >               // is done by destructor: r->unlock();
> > >               delete requests->remove(r);
> > >             }
> > > printf("AgentXRequestList::receive :: create a new
> > request with our pdu\n");
> > >             AgentXRequest* req = new
> > AgentXRequest(*pdu);
> > >             // paste locks
> > >             if (locks.size()>0) {
> > >                   for (int i=0; i<locks.size(); i++) 
> { 
> > >
> > req->locks.add(locks.getNth(i));
> > >                   }
> > >             }
> > >             locks.clear();
> > > printf("AgentXRequestList::receive :: delete pdu
> > now(mark 1)\n");
> > >             delete pdu; 
> <****************************** 
> > ZACK
> > > printf("<==================AgentXRequestList::receive
> > :: after delete pdu (mark 1)\n");
> > >             // if request is to be ignored req will 
> be 
> > >             // deleted by add_request
> > >             return add_request(req);
> > >
> > >       } // end "if (status == AGENTX_OK)" begin 
> "else" 
> > >       else {
> > >             switch (status) {
> > >             default:
> > >               break;
> > >             }
> > > printf("AgentXRequestList::receive :: delete pdu 
> (mark 
> > 2)\n");
> > >             delete pdu;
> > > printf("AgentXRequestList::receive :: after delete 
> pdu 
> > (mark 2)\n");
> > >       }
> > >       return 0;
> > 
> printf("<==================AgentXRequestList::receive\n" 
> > > );
> > > }
> > >
> > > So, might it be possible, that the pdu accessed in 
> the 
> > first (the answer) method be the one, that was just
> > deleted before in the receive-method? As far as I see,
> > this PDU was assigned to a request object, is that the
> > same ?
> > >
> > > Is this the bug? Or may it be possible, that I use 
> the 
> > framework in a wrong way? (most of the code dealing 
> with 
> > master and subagent creation and receiving requests, I
> > copied from the example files). Although, I added the
> > following code in the
> > > master-agents receive-Request loop:
> > >
> > >                   if ( sNMP_PDU_SET ==
> > pRequest->get_type())
> > >                   {
> > >                         while ( ! 
> pRequest->finished() 
> > )
> > >                         {
> > >                               // yeah, we do nothing
> > here: we have to wait, until request
> > >                               // is completely
> > proceeded.
> > >                               // we can be sure, that
> > request will finish some time.
> > >                               // This is either, when
> > all Variable Bindings have been
> > >                               // proceeded or when an
> > error occured.
> > >                         };
> > >                   }
> > >
> > > I added this code, because otherwise some problems
> > occured in the past (hard to remember what).
> > >
> > > To your questions:
> > > 1.) it happens on one variable very very quick and
> > often. But when performing some stress tests with our
> > software, it happens also with other variables but not
> > that quickly.
> > > 2.) one of the subagents is loaded by the 
> masteragent, 
> > so it is hard so say. According to code-dump, yes, I am
> > sure, you are right, it is the subagent.
> > > 3.) yes, it is
> > > 4.) no answer required, because 1 is true
> > > 5.) no answer requrired, because 3 is true
> > >
> > > unfortunately, I am not in my office on monday. So,
> > could you please also send the answer to my colleque
> > ansgar.springub at philips.com ??
> > > thank you very much for your help. Best regards,
> > christel
> > >
> > > Christel Sohnemann
> > > Software Development
> > > Philips Speech Processing Aachen, Zweigniederlassung
> > der Philips GmbH
> > > Kackertstr. 10, 52072 Aachen, Germany
> > > mailTo: christel.sohnemann at philips.com
> > > Tel:    +49 - (0)241 - 8871 191,    Fax: +49 - (0)241 
> - 
> > 8871 140
> > > http://www.speech.philips.com/
> >
> >
> >
> >
> >
> >
> >
> >
> 
> 
> 
> 
> 
>



More information about the AGENTPP mailing list