[AGENT++] Patch to fix Mib::cleanup() and prevent possible deadlock
Frank Fock
fock at agentpp.com
Tue Dec 16 23:29:00 CET 2014
Hi Claus,
The root question to answer is: Why is the thread that runs through the
MibEntry::update(..) method
is terminated without unlocking its locks?
I would expect the SIGPIPE handler to ignore the signal and set a flag
for the main-loop to reconnect
with the master. The reconnection (and terminate_set_requests) must then
not be started before the
requestList of the sub-agent runs empty.
Best regards,
Frank
Am 16.12.2014 22:09, schrieb Claus Klein:
> Hi Frank,
>
> after a timeout, the master agent close the connetion.
> We see a sigpipe error and close our session too.
> Then we call reqList->terminate_set_requests();
> After reconnection the masteragent, we may call cleanup(), but the lock at the mibtable from last update() is still active!
>
> As a result, we hang forever at the this point at main loop:
> void MibTable::remove_unused_rows()
> {
> start_synch();
>
> // … the subagent is not longer usable, deadlock, or not?
>
> Best Regards,
> Claus
>
> : (4)DEBUG : AgentXSlave: received something on ports
> 20141216.21:45:50: 10 A3 EA 7B FF 7F 00 00 ...{....
> : (2)EVENT : AgentXRequestList: request received (context)(tid)(pid)(siz)(type)(err)(status): (subagent), (37), (39), (1), (8), (0), (0)
> 20141216.21:45:50: 10 A3 EA 7B FF 7F 00 00 ...{....
> : (2)DEBUG : LockQueue: adding lock request (ptr): (140319072986512)
> 20141216.21:45:50: 00 10 59 03 01 00 00 00 ..Y.....
> : (2)EVENT : SubAgentXMib: starting thread execution
> 20141216.21:45:50: 00 10 59 03 01 00 00 00 ..Y.....
> : (2)EVENT : SubAgentXMib: CLEANUPSET (tid)(pid)(oid)...: (35), (38), (1.3.6.1.4.1.8072.2.2.2.1.5.5.116.101.115.116.50)
> 20141216.21:45:50: 00 10 59 03 01 00 00 00 ..Y.....
> : (3)EVENT : Agent: cleaning up set request: (35)
> virtual void Agentpp::netSnmpHostsEntry::cleanup_set_request(Agentpp::Request *, int &)
> 20141216.21:45:50: 00 10 59 03 01 00 00 00 ..Y.....
> : (2)DEBUG : LockQueue: adding release request (ptr): (140319061504408)
> 20141216.21:45:50: 00 30 60 03 01 00 00 00 .0`.....
> : (8)DEBUG : Synchronized: trylock success (id): (2187)
> 20141216.21:45:50: 00 10 59 03 01 00 00 00 ..Y.....
> : (2)DEBUG : LockQueue: adding release request (ptr): (140319064617872)
> 20141216.21:45:50: 00 10 59 03 01 00 00 00 ..Y.....
> : (8)DEBUG : Synchronized: trylock success (id): (244)
> 20141216.21:45:50: 10 A3 EA 7B FF 7F 00 00 ...{....
> : (1)DEBUG : TaskManager: task manager found
> 20141216.21:45:50: 10 A3 EA 7B FF 7F 00 00 ...{....
> : (2)DEBUG : TaskManager: after notify
> 20141216.21:45:50: 00 10 59 03 01 00 00 00 ..Y.....
> : (2)EVENT : SubAgentXMib: starting thread execution
> 20141216.21:45:50: 00 10 59 03 01 00 00 00 ..Y.....
> : (2)EVENT : SubAgentXMib: TESTSET (tid)(pid)(oid)...: (37), (39), (1.3.6.1.4.1.8072.2.2.2.1.5.5.116.101.115.116.50)
> 20141216.21:45:50: 00 10 59 03 01 00 00 00 ..Y.....
> : (3)EVENT : Agent: preparing set request: (37)
> 20141216.21:45:50: 10 A3 EA 7B FF 7F 00 00 ...{....
> : (4)DEBUG : AgentXSlave: received something on ports
> 20141216.21:45:50: 10 A3 EA 7B FF 7F 00 00 ...{....
> : (1)ERROR : AgentXSlave: lost connection with master
> 20141216.21:45:50: 10 A3 EA 7B FF 7F 00 00 ...{....
> : (2)EVENT : Mib::cleanup()
> virtual void Agentpp::netSnmpHostsEntry::update(Agentpp::Request *)
> 20141216.21:45:55: 00 10 59 03 01 00 00 00 ..Y.....
> : (2)DEBUG : LockQueue: adding lock request (ptr): (140319061504408)
> 20141216.21:45:55: 00 30 60 03 01 00 00 00 .0`.....
> : (8)DEBUG : Synchronized: trylock success (id): (213)
> virtual int Agentpp::netSnmpHostsEntry::prepare_set_request(Agentpp::Request *, int &)
> 20141216.21:45:55: 00 10 59 03 01 00 00 00 ..Y.....
> : (4)EVENT : RequestListAgentX: request answered (id)(status)(tid)(err)(removed)(sz): (37), (257), (37), (12), (0), (1)
> 20141216.21:45:55: 00 10 59 03 01 00 00 00 ..Y.....
> : (2)DEBUG : LockQueue: adding release request (ptr): (140319072986512)
> 20141216.21:45:55: 00 10 59 03 01 00 00 00 ..Y.....
> : (2)EVENT : SubAgentXMib: finished thread execution
>
> On 16.12.2014, at 00:22, Frank Fock <fock at agentpp.com> wrote:
>
>> Hi Claus,
>>
>> It is not a deadlock, because when you continue (end the sleep) everything
>> works again. It is simply a global lock which is necessary at that point.
>>
>> Best regards,
>> Frank
--
---
AGENT++
Maximilian-Kolbe-Str. 10
73257 Koengen, Germany
https://agentpp.com
Phone: +49 7024 8688230
Fax: +49 7024 8688231
More information about the AGENTPP
mailing list