[SNMP4J] RaceCondition in AgentX Communication
ZAUBZER Stephan
Stephan.ZAUBZER at frequentis.com
Tue Mar 15 12:14:55 CET 2011
Hi,
I have a Problem with the stability of the connection from an AgentXSubAgent
to the MasterAgent. The SubAgent is used in a System running 24/7 (RHEL
kernel 2.6.18, NET-SNMP version: 5.3.2.2). Upon disconnection (detected via a
TransportStateListener) a reconnection is Performed. However due to a race
condition within snmp4j the connection is not properly re-established leaving
the system with 2 connections (one with an established AgentX session and one
"plain TCP connection"). However the connection with the established AgentX
Session is not used, because the DefaultTcpTransportMapping only memorizes
the "last" connection to a certain address.
Details:
The following sequence of events can be observed from the log files (snmp4j
logging has been activated and set to DEBUG level for analyzing this issue):
At some point in time the "request=>response" pattern gets "out of sync" and
we experience a pattern in the form "request1, request2 => response1,
response2". Right after an occurrence of such a pattern the connection seems
to be dropped by the peer. While re-establishing the connection, snmp4j is
running into race-condition preventing a proper re-connection.
For the sake of readability the following log output is manually filtered to
contain only the relevant pieces. Lines beginning with "///" describe the
relevant pieces from the removed log output.
02:51:04,807 DEBUG [DefaultTCPTransportMapping_0.0.0.0/0]
org.snmp4j.transport.DefaultTcpTransportMapping: Key is readable
/// Reading message containing GetPDU with transactionID=13485
02:51:04,807 DEBUG [DefaultTCPTransportMapping_0.0.0.0/0]
org.snmp4j.transport.DefaultTcpTransportMapping: Key is readable
/// Reading message containing GetPDU with transactionID=13487
02:51:04,811 DEBUG [DefaultTCPTransportMapping_0.0.0.0/0]
org.snmp4j.transport.DefaultTcpTransportMapping: Key is writable
/// Sending message containing response with transactionID=13485
02:51:04,811 DEBUG [DefaultTCPTransportMapping_0.0.0.0/0]
org.snmp4j.transport.DefaultTcpTransportMapping: Key is readable
/// Reading message containing GetPDU with transactionID=13489
02:51:04,812 DEBUG [DefaultTCPTransportMapping_0.0.0.0/0]
org.snmp4j.transport.DefaultTcpTransportMapping: Key is writable
/// Obviously trying to send response for transactionID=13487
02:51:04,812 WARN [DefaultTCPTransportMapping_0.0.0.0/0]
org.snmp4j.transport.DefaultTcpTransportMapping: java.io.IOException: Broken
pipe
/// The broken pipe oviously happened when trying to send data.
/// Now the reconnector is kicking in
02:51:04,813 DEBUG [SubAgentXImpl periodic reconnector]
org.snmp4j.transport.DefaultTcpTransportMapping: Socket for address
'127.0.0.1/705' is closed, opening it...
/// Obviously at the same time snmp4j tries to send the last pending
outgoing message (response for transactionID=13489)
02:51:04,813 DEBUG [AgentXSubAgent.0]
org.snmp4j.transport.DefaultTcpTransportMapping: Socket for address
'127.0.0.1/705' is closed, opening it...
///Then the connection for the first opened socket succeeds (local port
56565)
02:51:04,813 DEBUG [DefaultTCPTransportMapping_0.0.0.0/0]
org.snmp4j.transport.DefaultTcpTransportMapping: Key is connectable
///Requesting to write AgentX-Open-PDU
02:51:04,813 DEBUG [DefaultTCPTransportMapping_0.0.0.0/0]
org.snmp4j.transport.DefaultTcpTransportMapping: Key is writable
///Write AgentX-Open-PDU
02:51:04,814 DEBUG [DefaultTCPTransportMapping_0.0.0.0/0]
org.snmp4j.transport.DefaultTcpTransportMapping: Key is readable
///Reading response => AgentX Session opened successfully.
/// Then the connection for the second opened socket succeeds (local port
56566)
02:51:04,814 DEBUG [DefaultTCPTransportMapping_0.0.0.0/0]
org.snmp4j.transport.DefaultTcpTransportMapping: Key is connectable
///This socket now overwrites the first socket (on which the successfully
opened session is relying) within the socket map in
DefaultTcpTransportMapping
///Now the last pending outgoing message from the previous session is sent
on the second opened socket
///As well as the first AgentX-Register-PDU for the new session (again via
the second opened socket).
///The following log entry clearly shows that snmp4j is using the second
opened socket (port 56566) for registering instead of the one on which the
session was opened (port 56565)
02:51:04,815 DEBUG [SubAgentXImpl periodic reconnector]
org.snmp4j.transport.DefaultTcpTransportMapping: Looking up connection for
destination '127.0.0.1/705' returned:
SocketEntry[peerAddress=127.0.0.1/705,socket=Socket[addr=/127.0.0.1,port=705,
localport=56566],lastUse=Tue Mar 15 02:51:04 UTC 2011]
More information about the SNMP4J
mailing list