[SNMP4J] Problem with timeout on SNMPv3 message

Peter Verthez Peter.Verthez at alcatel-lucent.com
Wed Dec 9 16:38:04 CET 2009


Hello,

We are currently using SNMP4J version 1.10.1, and we are facing a
problem with SNMPv3 messages that never seem to return from the SNMP4J
library.    We are using synchronous messages with a custom timeout
model of 8 + 16 + 32 seconds (i.e. 2 retries, with a timeout that
doubles on each retry).

The exact circumstance is the following:
- we are setting some attribute on an agent that will cause the agent to
restart; this set request is successful
- directly afterwards we are retrieving some attributes from that agent,
and typically that should give a timeout (it does in SNMPv2)
- but in SNMPv3, this agent is at some point responding with an SNMP
report with the snmpUnknownContexts variable, presumably because it is
at that moment restarting
--> due to this, the synchronous message never returns from the SNMP4J
library

I have the following tracing: first a Wireshark list with some comments,
showing the exact sequence (unrelated messages are removed from this list):

No. Time Source Destination Protocol Info
      1 0.000000 172.31.110.52 172.21.185.14 SNMP set-request
                -- this is the original set request (request id
214435902; SNMPv3 msg id 543834127)
      2 0.237779 172.21.185.14 172.31.110.52 SNMP get-response
                -- response on the set request (request id 214435902;
SNMPv3 msg id 543834127)
      3 0.423487 172.31.110.52 172.21.185.14 SNMP get-request
                -- the get request to reload the object details (request
id 214435904; SNMPv3 msg id 543834128)
      4 8.425570 172.31.110.52 172.21.185.14 SNMP get-request
                -- first retry on the get request (request id 214435904;
SNMPv3 msg id 543834129)
      6 24.431112 172.31.110.52 172.21.185.14 SNMP get-request
                -- second retry on the get request (request id
214435904; SNMPv3 msg id 543834130)
      9 31.247375 172.21.185.14 172.31.110.52 SNMP report
                -- a report of snmpUnknownContexts = 8 on SNMPv3 msg id
543834129
     11 31.253862 172.21.185.14 172.31.110.52 SNMP report
                -- a report of snmpUnknownContexts = 9 on SNMPv3 msg id
543834130

As you see, the first retry is sent after 8 seconds, the second after 8
+ 16 = 24 seconds, and the report that causes the problem is coming at
31 seconds (the first report is ignored properly because that SNMPv3
message already timed out).

I see the following in our logs after setting org.snmp4j logs to DEBUG
(this is for the second SNMP report):

2009-12-09 15:20:30,912 DEBUG
[DefaultUDPTransportMapping_172.31.110.52/0]-[org.snmp4j.transport.DefaultUdpTransportMapping] 

Received message from /172.21.185.14/161 with length 112:
30:6e:02:01:03:30:11:02:04:20:6a:40:12:02:03:00:ff:ff:04:01:00:02:01:03:04:26:30:24:04:0b:80:00:02:7d:03:00:90:d0:6d:fa:b0:02:01:01:02:02:1e:40:04:0a:61:6d:73:61:6e:74:77:65:72:70:04:00:04:00:30:2e:04:0b:80:00:02:7d:03:00:90:d0:6d:fa:b0:04:02:6e:74:a8:1b:02:01:00:02:01:00:02:01:00:30:10:30:0e:06:09:2b:06:01:06:03:0c:01:05:00:41:01:09 


2009-12-09 15:20:30,912 DEBUG
[DefaultUDPTransportMapping_172.31.110.52/0]-[org.snmp4j.mp.MPv3] SNMPv3
header decoded: msgId=543834130, msgMaxSize=65535, msgFlags=00, secModel=3
2009-12-09 15:20:30,912 DEBUG
[DefaultUDPTransportMapping_172.31.110.52/0]-[org.snmp4j.security.USM]
getUser(engineID=80:00:02:7d:03:00:90:d0:6d:fa:b0, securityName=amsantwerp)
2009-12-09 15:20:30,912 DEBUG
[DefaultUDPTransportMapping_172.31.110.52/0]-[org.snmp4j.mp.MPv3]
Removed cache entry:
StateReference[msgID=543834130,pduHandle=PduHandle[214435904],securityEngineID=80:00:02:7d:03:00:90:d0:6d:fa:b0,securityModel=org.snmp4j.security.USM at 1733af8,securityName=amsantwerp,securityLevel=1,contextEngineID=80:00:02:7d:03:00:90:d0:6d:fa:b0,contextName=xvps2] 


2009-12-09 15:20:30,995 DEBUG
[DefaultUDPTransportMapping_172.31.110.52/0]-[org.snmp4j.mp.MPv3]
RFC3412 §7.2.10 - Received PDU (msgID=543834130) is a response or an
internal class message. PduHandle.transactionID = 214435904
2009-12-09 15:20:30,995 DEBUG
[DefaultUDPTransportMapping_172.31.110.52/0]-[org.snmp4j.mp.MPv3] MPv3
finished
2009-12-09 15:20:30,996 DEBUG
[DefaultUDPTransportMapping_172.31.110.52/0]-[org.snmp4j.Snmp] Searching
pending request with handlePduHandle[214435904]
2009-12-09 15:20:30,996 INFO
[DefaultUDPTransportMapping_172.31.110.52/0]-[org.snmp4j.Snmp] Received
late report from 172.21.185.14/161 with request ID 0


This seems to say that the second report is also a report that is not
in-time, but the report is coming well before the second retry should
time out.   Is this a bug in SNMP4J?    I tried to trace into the SNMP4J
code, but I couldn't find anything obvious...

Note that for the first report, the following is printed, which seems to
be correct:

2009-12-09 15:20:30,912 DEBUG
[DefaultUDPTransportMapping_172.31.110.52/0]-[org.snmp4j.security.USM]
getUser(engineID=80:00:02:7d:03:00:90:d0:6d:fa:b0, securityName=amsantwerp)
2009-12-09 15:20:30,912 DEBUG
[DefaultUDPTransportMapping_172.31.110.52/0]-[org.snmp4j.mp.MPv3]
RFC3412 §7.2.10 - Received PDU (msgID=543834129) is a response or
internal class message, but cached information for the msgID could not
be found
2009-12-09 15:20:30,912 WARN
[DefaultUDPTransportMapping_172.31.110.52/0]-[org.snmp4j.MessageDispatcherImpl] 

noError

Best regards,
Peter.

-- 
Peter Verthez
Systems Engineer Network Mgt.
Alcatel-Lucent Bell NV





More information about the SNMP4J mailing list