[AGENT++] Re: recvfrom's blocking and process grinding to a halt

Wed Mar 22 23:11:19 CET 2006

Hi Jochen,

Thanks for the additional info. See my comments inline...

On 3/22/06, Jochen Katz <katz at agentpp.com> wrote:
>
> Hi John,
>
> >> the debugger on the process when this occurs I seem to find that
> >> many threads are blocked in recvfrom calls.
>
> recvfrom is called only from receive_snmp_response() which is only
> called from CSNMPMessageQueue::HandleEvents(), HandleEvents is only
> called from EventListHolder::SNMPProcessPendingEvents(). In this
> function the pevents_mutex synchronizes the calls
>
> m_eventList.GetFdSets()
> select()
> HandleEvents()
>
> So if a response is coming in GetFdSets returns the fd, select marks it
> as receivable and HanldeEvents calls recvfrom.
>
> So I see no reason why recvfrom on a fd with no data is called.

Hmmm, it definately seems like it's happening from what gdb is showing me.
Keep in mind that I'm using my modified codebase for multiple v3MP objects
if that changes anything (although I don't think it  should matter for
this?).

>> hundred times.  Is it possible that there is then some sort of race
> >>  condition where after doing the select and seeing data is
> >> available from the remote address there is a context switch and a
> >> recvfrom call in another thread is then receiving the data that the
> >>  other thread thought was available?  Then when the context switchs
> >>  back the first thread trys to recvfrom (having already selected)
> >> but the data is gone and it blocks?
>
> From my above analysis: No, but I already had some surprises with the HP
> event code...

I'll keep digging, what you say above makes sense, but seems to conflict
with what I'm seeing.

> It seems from my review of the code so far that each thread I have is
> >  sometimes calling the GetEvents and HandleEvents calls, but then
> > these are locking a global msgqueue and matching up the unique
> > request id's to place the responses with the appropriate requests
> > even if those requests were made by a different thread then the one
> > that has triggered the HandleEvents.  Is this accurate?
>
> Beside of the word global msgqueue, this is right. Each Snmp object has
> its own EventListHolder which has its own msg and notification queues.
> So the lock is per Snmp object. If two threads use the same Snmp object
> it is possible that one thread triggers the recvfrom of the other
> thread, but the response is stored in the msg queue and the other thread
> will only get it from there.

Ah, so the msgqueue is per Snmp object.  I created seperate Snmp objects
each time  I poll a device.  So each thread should have it's own Snmp object
at any given time and they are never shared between threads.

This brings up a question.  If the msg queue is per Snmp object, how is the
recvfrom call triggered on that thread going to know that it is getting data
that is intended for it's Snmp object and not some other Snmp object in
another thread that shares the same address?  I think this might be my
problem... I have devices being polled on seperate threads with seperate
Snmp objects but they have the same ip address.  So perhaps the recvfrom
call on one thread is recieving the data that should have been gotten by the
recvfrom on another thread?

To take this to the race condition issue if the lock is per snmp object then
it seems like it does me no good.  What if 2 threads both do a request to
10.0.0.1 and then both select to see if there is data from that address at
once (they can do this since they have different locks), each of them sees
that there is data, but in reality there is only one datagram waiting for
reading.  Now whichever thread gets to recvfrom first gets that datagram and
the other blocks.  Am I missing something here or is that a potential issue?

> If so for my purposes it seems like I'm wasting alot of cycles and
> > getting into some threading issues because of each thread trying to
> > check for new data and doing recvfroms the contending for the global
> > queue.  Wouldn't it be better in a multithreaded scenario to have a
> > single event handler thread that was dedicated to checking the
> > sockets, getting data, and updating the global queue, then just have
> > each thread wait for a response (or error indication) to appear in
> > that queue?
>
> Well, long ago the EventListHOlder class was created to remove the
> global singleton event queue. This allows processing of async callbacks
> and notifications for multiple threads.

Makes sense...

>  And if that makes sense then is there anyway to get this
> >  type of setup in SNMP++ or would it require me to make some fairly
> > major modications to the IO hanlding architecture of the library.  I
>
> It's not that easy as to allow multipmle v3MP objects, but as there once
> was a global queue, it can be done. The latest snapshot removes the X11
> and user defined queues from the event code, so  it will be easier to
> search through the code.

Ah, I'll check out the snapshot then, it does look like this will be much
harder than the v3MP issue was to fix.

But before you start digging into the code, some questions, so may be I
> can find out what's going wrong.
>
> Are you using sync or async requests?

Sync

Do you use the same Snmp object within different threads at the same time?

No, each thread creates an Snmp object whenever it needs to poll a device
and then that object is destructed when the polling is done.

Do you get timeouts although the agent sends a reply?

It's hard to know (since I'm polling the same agent hundreds of times as
fast as I can), but I don't seem to be getting any unexpected timeouts.  The
threads just block sometimes.

Regards,
>   Jochen
> _______________________________________________
> AGENTPP mailing list
> AGENTPP at agentpp.org
> http://lists.agentpp.org/mailman/listinfo/agentpp

John