[AGENT++] Patch to fix Mib::cleanup() and prevent possible deadlock

Claus Klein claus.klein at arcormail.de
Tue Dec 16 22:09:06 CET 2014


Hi Frank,

after a timeout, the master agent close the connetion.
We see  a sigpipe error and close our session too.
Then we call reqList->terminate_set_requests();
After reconnection the masteragent, we may call cleanup(), but the lock at the mibtable  from last update() is still active!

As a result, we hang forever at the this point at main loop:
void MibTable::remove_unused_rows()
{
	start_synch();

// … the subagent is not longer usable, deadlock, or not?

Best Regards,
Claus

: (4)DEBUG  : AgentXSlave: received something on ports
20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
: (2)EVENT  : AgentXRequestList: request received (context)(tid)(pid)(siz)(type)(err)(status): (subagent), (37), (39), (1), (8), (0), (0)
20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
: (2)DEBUG  : LockQueue: adding lock request (ptr): (140319072986512)
20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
: (2)EVENT  : SubAgentXMib: starting thread execution
20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
: (2)EVENT  : SubAgentXMib: CLEANUPSET (tid)(pid)(oid)...: (35), (38), (1.3.6.1.4.1.8072.2.2.2.1.5.5.116.101.115.116.50)
20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
: (3)EVENT  : Agent: cleaning up set request: (35)
virtual void Agentpp::netSnmpHostsEntry::cleanup_set_request(Agentpp::Request *, int &)
20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
: (2)DEBUG  : LockQueue: adding release request (ptr): (140319061504408)
20141216.21:45:50:   00 30 60 03 01 00 00 00                            .0`.....
: (8)DEBUG  : Synchronized: trylock success (id): (2187)
20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
: (2)DEBUG  : LockQueue: adding release request (ptr): (140319064617872)
20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
: (8)DEBUG  : Synchronized: trylock success (id): (244)
20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
: (1)DEBUG  : TaskManager: task manager found
20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
: (2)DEBUG  : TaskManager: after notify
20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
: (2)EVENT  : SubAgentXMib: starting thread execution
20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
: (2)EVENT  : SubAgentXMib: TESTSET (tid)(pid)(oid)...: (37), (39), (1.3.6.1.4.1.8072.2.2.2.1.5.5.116.101.115.116.50)
20141216.21:45:50:   00 10 59 03 01 00 00 00                            ..Y.....
: (3)EVENT  : Agent: preparing set request: (37)
20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
: (4)DEBUG  : AgentXSlave: received something on ports
20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
: (1)ERROR  : AgentXSlave: lost connection with master
20141216.21:45:50:   10 A3 EA 7B FF 7F 00 00                            ...{....
: (2)EVENT  : Mib::cleanup()
virtual void Agentpp::netSnmpHostsEntry::update(Agentpp::Request *)
20141216.21:45:55:   00 10 59 03 01 00 00 00                            ..Y.....
: (2)DEBUG  : LockQueue: adding lock request (ptr): (140319061504408)
20141216.21:45:55:   00 30 60 03 01 00 00 00                            .0`.....
: (8)DEBUG  : Synchronized: trylock success (id): (213)
virtual int Agentpp::netSnmpHostsEntry::prepare_set_request(Agentpp::Request *, int &)
20141216.21:45:55:   00 10 59 03 01 00 00 00                            ..Y.....
: (4)EVENT  : RequestListAgentX: request answered (id)(status)(tid)(err)(removed)(sz): (37), (257), (37), (12), (0), (1)
20141216.21:45:55:   00 10 59 03 01 00 00 00                            ..Y.....
: (2)DEBUG  : LockQueue: adding release request (ptr): (140319072986512)
20141216.21:45:55:   00 10 59 03 01 00 00 00                            ..Y.....
: (2)EVENT  : SubAgentXMib: finished thread execution

On 16.12.2014, at 00:22, Frank Fock <fock at agentpp.com> wrote:

> Hi Claus,
> 
> It is not a deadlock, because when you continue (end the sleep) everything
> works again. It is simply a global lock which is necessary at that point. 
> 
> Best regards,
> Frank



More information about the AGENTPP mailing list