Unhandled Exception in AgentX++ : another detail for that Bug
Frank.Fock____t-online.de
Frank.Fock____t-online.de
Tue Mar 26 14:06:32 CET 2002
christel.sohnemann at philips.com schrieb:
> The problem occurs in the AgentXRequestList::answer
> Method.
>
OK, this was the misunderstanding. From your code snippet
I thought that the crash occured in the receive method.
> First, the PDU is copied, which is fine.
> Later on, in the switch statement, the bool variable
> "remove" is set to false. That's why the second if is
> proceeded, and not the first. OK, in the second if, there
> ist the code
> "((AgentXRequest*)req)->get_agentx_pdu()->...:" ... and
> that is exactly where
> it crashes. It does so, because the PDU of the request is
> invalid (pointer to 0xaeaeaeae something).
This would point to a deleted or duplicate answered request.
> My guess is that this pdu object (the one IN the request)
> is deleted in AgentXRequestList::receive ???? I guess so,
> because my debug-printfs seem to show me that this lines
> of code are executed right before the crash ....
>
That's what I tried to explain. In the receive method there
will be never an AgentXPdu object of a request deleted,
because it only uses its own copy. Thus, the request/pdu
deletion must be occuring somewhere else.
Until now, I have not been successful to reproduce the
problem, but I am still trying...
Best regards,
Frank
>
>
> OK, when it also crashes without the extra code in the
> mainloop, then you can ignore what I wrote about using
> thread pools to achieve that only a single SET is
> processed at once.
>
> But where in the code do you see that a pointer to the
> AgentXPdu allocated in the receive method ever leaves
> that method? So, if the subagent crashes when that pdu
> is deleted, then IMHO there must be memory fault
> somewhere else, which should not be hard to find with
> Purify.
>
> Sorry, I have no better solution ready to hand.
>
> Best regards,
> Frank
>
> christel.sohnemann at philips.com schrieb:
> > Hi,
> > when I remove the code as you said (but I do nothing
> else
> > because I do not understand what you mean) I still get
> > the unhandled exception and it still occurs in the
> > "answer" method of the requestlist. So I still guess my
> > assuption is right.
> > Christel
> >
> >
> >
> >
> > Christel Sohnemann
> > Software Development
> > Philips Speech Processing Aachen, Zweigniederlassung
> der
> > Philips GmbH
> > Kackertstr. 10, 52072 Aachen, Germany
> > mailTo: christel.sohnemann at philips.com
> > Tel: +49 - (0)241 - 8871 191, Fax: +49 - (0)241 -
> > 8871 140
> > http://www.speech.philips.com/
> >
> >
> >
> >
> >
> > Frank.Fock at t-onl
> >
> >
> > ine.de (Frank To:
> > Christel Sohnemann/ACN/BE/PHILIPS at EMEA1
> >
> > Fock) cc:
> > agentpp-dl at agentpp.com
> >
> >
> > Ansgar Springub/ACN/BE/PHILIPS at EMEA1
> >
> > 25.03.2002 10:20
> > Subject: Re: Unhandled Exception in AgentX++ :
> another
> > detail for that Bug
> >
> >
> >
> >
> > Classification:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Hello Christel,
> >
> > I think the assumption that the instance of Pdu has
> been
> > deleted by another
> > thread cannot be true, because it is created and
> deleted
> > within the receive
> > method and not used outside. The data used outside is
> > always a copy of
> > the pdu.
> >
> > Before trying anything else, I would remove the code
> you
> > quoted from the
> > main loop. It calls the finished() method of a Request
> > object that might be
> > already deleted. This may cause a jump to an arbitrary
> > memory location
> > which could cause any type of fault.
> >
> > Instead using this construct, you may better use a PDU
> > type driven
> > dispatching to two ThreadPools, where the thread pool
> for
> > SET request
> > would have the size 1. You can use the AgentX++
> > dispatching as an
> > example.
> >
> > Hope this helps.
> >
> > Best regards,
> > Frank
> >
> >
> > christel.sohnemann____philips.com wrote:
> >
> > > Hello again,
> > > after inserting some "printf's" in the AgentX++ code,
> I
> > found out the following:
> > >
> > > The unhandled exception occurs in the marked
> > (<******************************) line:
> > >
> > > void AgentXRequestList::answer(Request* req)
> > TS_SYNCHRONIZED(
> > > {
> > > // CAUTION: Make a copy of PDU here because
> when
> > we answer
> > > // the request, the response could be so fast
> > back to our
> > > // master, that the following request could be
> > processed
> > > // before we have finished here. In the case of
> a
> > COMMIT
> > > // or CLEANUP following a PREPARE_SET this
> could
> > cause a seg
> > > // fault because the request were are dealy
> with
> > here is deleted
> > > // by the main thread by calling
> > AgentXRequestList::receive!
> > > //
> > > AgentXPdu* pdu = new
> > AgentXPdu(*((AgentXPdu*)req->get_pdu()));
> > >
> > > boolean remove = TRUE;
> > > // check if we need request for further
> > processing
> > > switch (pdu->get_agentx_type()) {
> > > case AGENTX_TESTSET_PDU:
> > > case AGENTX_COMMITSET_PDU:
> > > remove = FALSE;
> > > break;
> > > }
> > > if (remove)
> > > requests->remove(req);
> > > pdu->set_agentx_type(AGENTX_RESPONSE_PDU);
> > > int status = agentx->send(*pdu);
> > >
> > > if (!remove) {
> > > printf("AgentXRequestList::answer : remove ist
> FALSE");
> > > // If we do not get a CLEANUP from the master
> > > // we have to remove the pending request by
> > ourselves.
> > > ((AgentXRequest*)req)->get_agentx_pdu()->
> > <******************************
> requests
> > PDU is not valid
> >
> set_timestamp(agentx->compute_timeout(AGENTX_DEFAULT_TIM
> > > EOUT));
> > > printf("AgentXRequestList::answer : pdu timestamp
> > wurde gesetzt");
> > > ((AgentXRequest*)req)->unlock();
> > > }
> > > LOG_BEGIN(EVENT_LOG | 4);
> > > LOG("RequestListAgentX: request answered
> > (id)(status)(tid)(err)(removed)(sz)");
> > > LOG(pdu->get_request_id());
> > > LOG(status);
> > > LOG(pdu->get_transaction_id());
> > > LOG(pdu->get_error_status());
> > > LOG(remove);
> > > LOG(pdu->get_vb_count());
> > > LOG_END;
> > >
> > > delete pdu;
> > > if (remove)
> > > delete req;
> > > })
> > >
> > > however, just before the exception occurs, the
> > following line was proceeded by another thread (marked
> by
> > <******************************):
> > >
> > > Request* AgentXRequestList::receive(int sec)
> > > {
> > > printf("==================>
> AgentXRequestList::receive
> > \n");
> > > int status = AGENTX_OK;
> > > AgentXPdu* pdu = agentx->receive(sec, status);
> > > if (!pdu)
> > > {
> > > printf("<==================AgentXRequestList::receive
> :
> > the PDU we received is NULL\n");
> > > return 0;
> > > }
> > >
> > > LOG_BEGIN(EVENT_LOG | 2);
> > > LOG("AgentXRequestList: request received
> > (context)(tid)(pid)(siz)(type)(err)(status)");
> > > LOG(pdu->get_context().get_printable());
> > > LOG(pdu->get_transaction_id());
> > > LOG(pdu->get_packet_id());
> > > LOG(pdu->get_vb_count());
> > > LOG(pdu->get_agentx_type());
> > > LOG(pdu->get_error_status());
> > > LOG(status);
> > > LOG_END;
> > >
> > > if (status == AGENTX_OK)
> > > {
> > > Array<MibEntry> locks;
> > > switch (pdu->get_agentx_type()) {
> > > case AGENTX_GET_PDU:
> > > case AGENTX_GETNEXT_PDU:
> > > case AGENTX_GETBULK_PDU:
> > > // for each search range create an vb
> > > // with the lower bound as oid and null
> > as value
> > > pdu->build_vbs_from_ranges();
> > > break;
> > > case AGENTX_COMMITSET_PDU:
> > > case AGENTX_CLEANUPSET_PDU:
> > > case AGENTX_UNDOSET_PDU:
> > > printf("AgentXRequestList::receive :: one of
> > AGENTX_COMMITSET_PDU, AGENTX_CLEANUPSET_PDU,
> > AGENTX_UNDOSET_PDU\n");
> > > AgentXRequest* r =
> > > (AgentXRequest*)
> > >
> > find_request_on_id(pdu->get_transaction_id());
> > > if (!r) {
> > > // pdu does not follow a testset
> pdu
> > -> ignore
> > > LOG_BEGIN(ERROR_LOG | 1);
> > > LOG("AgentXRequestList: commit,
> > cleanup, or undo request does not follow a test set
> > request (pid)(tid)(type)");
> > > LOG(pdu->get_packet_id());
> > > LOG(pdu->get_transaction_id());
> > > LOG(pdu->get_agentx_type());
> > > LOG_END;
> > > printf("AgentXRequestList::receive :: delete pdu\n");
> > > delete pdu;
> > > printf("<==================AgentXRequestList::receive
> > :: after delete pdu\n");
> > > return 0;
> > > }
> > > // Acquire lock for the existing
> request,
> > because
> > > // it may be still in the queue, when
> the
> > master
> > > // timed out that request. We are
> > blocking here
> > > // but this case should be rare.
> > > r->lock();
> > > pdu->set_vblist(r->originalVbs,
> > r->originalSize);
> > > // copy locks
> > > for (int i=0; i<r->originalSize; i++) {
> > > locks.add(r->get_locked(i));
> > > }
> > > r->locks.clear();
> > > // is done by destructor: r->unlock();
> > > delete requests->remove(r);
> > > }
> > > printf("AgentXRequestList::receive :: create a new
> > request with our pdu\n");
> > > AgentXRequest* req = new
> > AgentXRequest(*pdu);
> > > // paste locks
> > > if (locks.size()>0) {
> > > for (int i=0; i<locks.size(); i++)
> {
> > >
> > req->locks.add(locks.getNth(i));
> > > }
> > > }
> > > locks.clear();
> > > printf("AgentXRequestList::receive :: delete pdu
> > now(mark 1)\n");
> > > delete pdu;
> <******************************
> > ZACK
> > > printf("<==================AgentXRequestList::receive
> > :: after delete pdu (mark 1)\n");
> > > // if request is to be ignored req will
> be
> > > // deleted by add_request
> > > return add_request(req);
> > >
> > > } // end "if (status == AGENTX_OK)" begin
> "else"
> > > else {
> > > switch (status) {
> > > default:
> > > break;
> > > }
> > > printf("AgentXRequestList::receive :: delete pdu
> (mark
> > 2)\n");
> > > delete pdu;
> > > printf("AgentXRequestList::receive :: after delete
> pdu
> > (mark 2)\n");
> > > }
> > > return 0;
> >
> printf("<==================AgentXRequestList::receive\n"
> > > );
> > > }
> > >
> > > So, might it be possible, that the pdu accessed in
> the
> > first (the answer) method be the one, that was just
> > deleted before in the receive-method? As far as I see,
> > this PDU was assigned to a request object, is that the
> > same ?
> > >
> > > Is this the bug? Or may it be possible, that I use
> the
> > framework in a wrong way? (most of the code dealing
> with
> > master and subagent creation and receiving requests, I
> > copied from the example files). Although, I added the
> > following code in the
> > > master-agents receive-Request loop:
> > >
> > > if ( sNMP_PDU_SET ==
> > pRequest->get_type())
> > > {
> > > while ( !
> pRequest->finished()
> > )
> > > {
> > > // yeah, we do nothing
> > here: we have to wait, until request
> > > // is completely
> > proceeded.
> > > // we can be sure, that
> > request will finish some time.
> > > // This is either, when
> > all Variable Bindings have been
> > > // proceeded or when an
> > error occured.
> > > };
> > > }
> > >
> > > I added this code, because otherwise some problems
> > occured in the past (hard to remember what).
> > >
> > > To your questions:
> > > 1.) it happens on one variable very very quick and
> > often. But when performing some stress tests with our
> > software, it happens also with other variables but not
> > that quickly.
> > > 2.) one of the subagents is loaded by the
> masteragent,
> > so it is hard so say. According to code-dump, yes, I am
> > sure, you are right, it is the subagent.
> > > 3.) yes, it is
> > > 4.) no answer required, because 1 is true
> > > 5.) no answer requrired, because 3 is true
> > >
> > > unfortunately, I am not in my office on monday. So,
> > could you please also send the answer to my colleque
> > ansgar.springub at philips.com ??
> > > thank you very much for your help. Best regards,
> > christel
> > >
> > > Christel Sohnemann
> > > Software Development
> > > Philips Speech Processing Aachen, Zweigniederlassung
> > der Philips GmbH
> > > Kackertstr. 10, 52072 Aachen, Germany
> > > mailTo: christel.sohnemann at philips.com
> > > Tel: +49 - (0)241 - 8871 191, Fax: +49 - (0)241
> -
> > 8871 140
> > > http://www.speech.philips.com/
> >
> >
> >
> >
> >
> >
> >
> >
>
>
>
>
>
>
More information about the AGENTPP
mailing list