[AGENT++] Patch to fix Mib::cleanup() and prevent possible deadlock

Frank Fock fock at agentpp.com
Tue Dec 16 23:29:00 CET 2014


Hi Claus,

The root question to answer is: Why is the thread that runs through the 
MibEntry::update(..) method
is terminated without unlocking its locks?
I would expect the SIGPIPE handler to ignore the signal and set a flag 
for the main-loop to reconnect
with the master. The reconnection (and terminate_set_requests) must then 
not be started before the
requestList of the sub-agent runs empty.

Best regards,
Frank


Am 16.12.2014 22:09, schrieb Claus Klein:
> Hi Frank,
>
> after a timeout, the master agent close the connetion.
> We see  a sigpipe error and close our session too.
> Then we call reqList->terminate_set_requests();
> After reconnection the masteragent, we may call cleanup(), but the lock at the mibtable  from last update() is still active!
>
> As a result, we hang forever at the this point at main loop:
> void MibTable::remove_unused_rows()
> {
> 	start_synch();
>
> // … the subagent is not longer usable, deadlock, or not?
>
> Best Regards,
> Claus
>
> : (4)DEBUG  : AgentXSlave: received something on ports
> 20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
> : (2)EVENT  : AgentXRequestList: request received (context)(tid)(pid)(siz)(type)(err)(status): (subagent), (37), (39), (1), (8), (0), (0)
> 20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
> : (2)DEBUG  : LockQueue: adding lock request (ptr): (140319072986512)
> 20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
> : (2)EVENT  : SubAgentXMib: starting thread execution
> 20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
> : (2)EVENT  : SubAgentXMib: CLEANUPSET (tid)(pid)(oid)...: (35), (38), (1.3.6.1.4.1.8072.2.2.2.1.5.5.116.101.115.116.50)
> 20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
> : (3)EVENT  : Agent: cleaning up set request: (35)
> virtual void Agentpp::netSnmpHostsEntry::cleanup_set_request(Agentpp::Request *, int &)
> 20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
> : (2)DEBUG  : LockQueue: adding release request (ptr): (140319061504408)
> 20141216.21:45:50:   00 30 60 03 01 00 00 00                            .0`.....
> : (8)DEBUG  : Synchronized: trylock success (id): (2187)
> 20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
> : (2)DEBUG  : LockQueue: adding release request (ptr): (140319064617872)
> 20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
> : (8)DEBUG  : Synchronized: trylock success (id): (244)
> 20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
> : (1)DEBUG  : TaskManager: task manager found
> 20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
> : (2)DEBUG  : TaskManager: after notify
> 20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
> : (2)EVENT  : SubAgentXMib: starting thread execution
> 20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
> : (2)EVENT  : SubAgentXMib: TESTSET (tid)(pid)(oid)...: (37), (39), (1.3.6.1.4.1.8072.2.2.2.1.5.5.116.101.115.116.50)
> 20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
> : (3)EVENT  : Agent: preparing set request: (37)
> 20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
> : (4)DEBUG  : AgentXSlave: received something on ports
> 20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
> : (1)ERROR  : AgentXSlave: lost connection with master
> 20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
> : (2)EVENT  : Mib::cleanup()
> virtual void Agentpp::netSnmpHostsEntry::update(Agentpp::Request *)
> 20141216.21:45:55:   00 10 59 03 01 00 00 00                            ..Y.....
> : (2)DEBUG  : LockQueue: adding lock request (ptr): (140319061504408)
> 20141216.21:45:55:   00 30 60 03 01 00 00 00                            .0`.....
> : (8)DEBUG  : Synchronized: trylock success (id): (213)
> virtual int Agentpp::netSnmpHostsEntry::prepare_set_request(Agentpp::Request *, int &)
> 20141216.21:45:55:   00 10 59 03 01 00 00 00                            ..Y.....
> : (4)EVENT  : RequestListAgentX: request answered (id)(status)(tid)(err)(removed)(sz): (37), (257), (37), (12), (0), (1)
> 20141216.21:45:55:   00 10 59 03 01 00 00 00                            ..Y.....
> : (2)DEBUG  : LockQueue: adding release request (ptr): (140319072986512)
> 20141216.21:45:55:   00 10 59 03 01 00 00 00                            ..Y.....
> : (2)EVENT  : SubAgentXMib: finished thread execution
>
> On 16.12.2014, at 00:22, Frank Fock <fock at agentpp.com> wrote:
>
>> Hi Claus,
>>
>> It is not a deadlock, because when you continue (end the sleep) everything
>> works again. It is simply a global lock which is necessary at that point.
>>
>> Best regards,
>> Frank

-- 
---
AGENT++
Maximilian-Kolbe-Str. 10
73257 Koengen, Germany
https://agentpp.com
Phone: +49 7024 8688230
Fax:   +49 7024 8688231



More information about the AGENTPP mailing list