[SNMP4J] SNMPv3 problem with unhandled responses

Frank Fock fock at agentpp.com
Mon Oct 27 08:35:50 CET 2014


Hi Peter,

Operarting system and JDK version (including whether it is a Oracle or 
3rd party JRE) would be helpful
too.

I will analyze it. Meanwhile, you could try the following fix in Snmp.java:

   protected boolean resendRequest(PendingRequest request, PDU response) {
     if (request.useNextPDU()) {
       request.responseReceived = false;
       synchronized (pendingRequests) {
         pendingRequests.remove(request.key);
         PduHandle holdKeyUntilResendDone = request.key;
         request.key = null;
         handleInternalResponse(response, request.pdu, 
request.target.getAddress());
         try {
           sendMessage(request.pdu, request.target, request.transport, 
request);
         }
         catch (IOException e) {
           logger.error("IOException while resending request after RFC 
5343 context engine ID discovery: " +
               e.getMessage(), e);
         }
         // now the previous retry can be released
         if (logger.isDebugEnabled()) {
           logger.debug("Releasing PDU handle "+holdKeyUntilResendDone);
         }
         holdKeyUntilResendDone = null;
       }
       return true;
     }
     return false;
   }


 From my analysis, I thought this improvement would not have any effect 
on your issue,
but I am not 100% sure. I will have to exeactly analyze the 
request/response sequences
to be sure.

Best regards,
Frank


Am 27.10.2014 08:01, schrieb Peter Verthez:
> Hi Frank,
>
> We have now tried with both SNMP4J 2.3.0 and 2.3.1 and we're facing 
> the same problem.
>
> Can you investigate this?   Is there any more information we could 
> provide?
>
> Best regards,
> Peter.
>
>
> On 24/10/2014 8:17, Peter Verthez wrote:
>> OK, thanks, we'll try that (we are on the latest version in our 
>> latest release, but the customer is using an older release of our 
>> product).
>>
>> Best regards,
>> Peter.
>>
>>
>> On 24/10/2014 2:16, Frank Fock wrote:
>>> Hi Peter,
>>>
>>> Please update to the latest version. There have been several fixes 
>>> about race conditions
>>> in the Snmp class regarding retries and the PduHandleCallback. The 
>>> current implementation
>>> seems to be OK regarding handling of the Message IDs and the MPv3 
>>> Cache WeakHashMap.
>>>
>>> Best regards,
>>> Frank
>>>
>>> Am 23.10.2014 12:34, schrieb Peter Verthez:
>>>> Hi Frank,
>>>>
>>>> One of our customers has reported a problem with SNMPv3. The 
>>>> version of our application that they use is using SNMP4J 2.0.1. For 
>>>> the particular SNMP request that has the problem, the agent is 
>>>> known to respond very slowly, so the timeout model for that 
>>>> particular request has been set to:
>>>> - a first timeout after 30 seconds, then a retransmission
>>>> - a second timeout after 60 seconds (so 90 seconds after the 
>>>> original request), then a retransmission
>>>> - a third timeout after 120 seconds (210 seconds after the original 
>>>> request)
>>>>
>>>> Now, we observe the following behavior (you can also see that in 
>>>> the logs that I'm pasting below):
>>>>
>>>> 2014-10-22 10:42:33,428 => Original request
>>>> 2014-10-22 10:43:03,430 => 1st retransmission
>>>> 2014-10-22 10:43:37,444 => Response to original request (rejected)
>>>> 2014-10-22 10:44:03,440 => 2nd retransmission
>>>> 2014-10-22 10:44:41,466 => Response to 1st retransmission (rejected)
>>>> 2014-10-22 10:45:45,539 => Response to 2nd retransmission (accepted)
>>>>
>>>> So the problem that we see is that the first response (the response 
>>>> to the original request) is not accepted for some reason.   This 
>>>> looks similar to the problem SFJ-43, but that was fixed in SNMP4J 
>>>> 2.0.0 according to the change log.   We are in the process of 
>>>> trying to reproduce it on SNMP4J 2.3.0, but I see no relevant 
>>>> change in the change log, so I guess there will be no difference.
>>>>
>>>> The logs are the following (with some sensitive information removed):
>>>>
>>>> ------------------------------------------------
>>>> 2014-10-22 10:42:33,426 DEBUG 
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.UsmUserTable] Adding 
>>>> user <removed> = UsmUser 
>>>> [secName=<removed>,authProtocol=1.3.6.1.6.3.10.1.1.3,authPassphrase=<removed>,privProtocol=1.3.6.1.6.3.10.1.2.2,privPassphrase=<removed>,localizationEngineID=null]
>>>> 2014-10-22 10:42:33,427 DEBUG 
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.mp.MPv3] Context engine ID of 
>>>> scoped PDU is empty! Setting it to authoritative engine ID: 
>>>> 80:00:02:7d:03:00:0e:86:1e:5f:9b
>>>> 2014-10-22 10:42:33,427 DEBUG 
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.USM] getUser 
>>>> (engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b, securityName=<removed>)
>>>> 2014-10-22 10:42:33,427 DEBUG 
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.USM] RFC3414 §3.1.4.a 
>>>> Outgoing message needs to be encrypted
>>>> 2014-10-22 10:42:33,427 DEBUG 
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.PrivDES] Preparing 
>>>> decrypt_params.
>>>> 2014-10-22 10:42:33,427 DEBUG 
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.PrivDES] Preparing iv 
>>>> for encryption.
>>>> 2014-10-22 10:42:33,427 DEBUG 
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.PrivDES] Using padding.
>>>> 2014-10-22 10:42:33,427 DEBUG 
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.PrivDES] Encryption 
>>>> finished.
>>>> 2014-10-22 10:42:33,427 DEBUG 
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.mp.MPv3] Adding cache entry: 
>>>> StateReference[msgID=1547681362,pduHandle=PduHandle[1912160112],securityEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,securityModel=org.snmp4j.security.USM at 897ec81,securityName=<removed>,securityLevel=3,contextEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,contextName=nt,retryMsgIDs=null] 
>>>>
>>>> 2014-10-22 10:42:33,428 DEBUG 
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.Snmp] Running pending sync 
>>>> request with handle PduHandle[1912160112] and retry count left 2
>>>> 2014-10-22 10:42:33,428 DEBUG 
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.transport.DefaultUdpTransportMapping] 
>>>> Sending message to 124.150.10.52/161 with length 143: 
>>>> 30:81:8c:02:01:03:30:11:02:04:5c:3f:be:52:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:
>>>> 00:0e:86:1e:5f:9b:02:01:05:02:03:00:f6:1c:04:09:31:40:72:61:6c:65:69:67:68:04:0c:aa:82:4f:0d:79:70:ab:03:54:1d:8d:07:04:08: 
>>>>
>>>> 00:00:00:05:1a:64:0a:89:04:38:56:b3:24:74:26:e8:bc:d0:be:5c:f4:6e:59:d6:ac:2e:35:2d:01:fd:91:3f:be:0e:75:31:aa:96:4d:9b:36:ab:27:e7:65:e6:e3:18:55: 
>>>>
>>>> a6:1a:fc:be:cc:22:3c:92:47:6a:06:3b:e7:30:1a:16:2c
>>>>
>>>>
>>>> 2014-10-22 10:43:03,430 DEBUG [Timer-30]-[org.snmp4j.mp.MPv3] 
>>>> Adding cache entry: 
>>>> StateReference[msgID=1547681458,pduHandle=PduHandle[1912160112],securityEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,securityModel=org.snmp4j.security.USM at 897ec81,securityName=<removed>,securityLevel=3,contextEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,contextName=nt,retryMsgIDs=null]
>>>> 2014-10-22 10:43:03,430 DEBUG [Timer-30]-[org.snmp4j.Snmp] Running 
>>>> pending sync request with handle PduHandle[1912160112] and retry 
>>>> count left 1
>>>> 2014-10-22 10:43:03,430 DEBUG 
>>>> [Timer-30]-[org.snmp4j.transport.DefaultUdpTransportMapping] 
>>>> Sending message to 124.150.10.52/161 with length 143: 
>>>> 30:81:8c:02:01:03:30:11:02:04:5c:3f:be:b2:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f:9b:
>>>> 02:01:05:02:03:00:f6:3a:04:09:31:40:72:61:6c:65:69:67:68:04:0c:01:bd:0d:b8:20:96:05:00:f8:52:26:ac:04:08:00:00:00:05:1a:64:0a: 
>>>>
>>>> e6:04:38:76:37:c0:dd:e9:97:38:33:11:2d:3e:76:31:a8:67:2e:bb:dd:8b:cd:cc:5a:33:28:ec:53:6c:a2:ab:da:f7:5b:88:9d:98:59:f3:e9:ae: 
>>>>
>>>> 2b:d5:12:16:2f:dd:4d:33:5d:c7:6e:19:14:65:f4:2c:60
>>>>
>>>>
>>>> 2014-10-22 10:43:37,444 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.transport.DefaultUdpTransportMapping] 
>>>> Received message from /124.150.10.52/161 with length 152:
>>>> 30:82:00:94:02:01:03:30:11:02:04:5c:3f:be:52:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f: 
>>>>
>>>> 9b:02:01:05:02:03:00:f6:5d:04:09:31:40:72:61:6c:65:69:67:68:04:0c:e0:81:9d:7f:48:f7:12:2b:72:28:33:0c:04:08:08:35:08:35:08:35: 
>>>>
>>>> 08:35:04:40:16:f0:c4:fd:8a:3a:e6:01:89:de:17:22:d3:75:95:69:9d:81:d6:f8:44:47:97:91:bf:d9:8c:cc:f1:fc:04:5e:47:84:1b:04:ae:37: 
>>>>
>>>> b0:3a:58:ec:2b:6f:c9:8f:5d:08:0c:e3:33:f1:7f:5b:ca:95:cf:c0:43:72:c6:9c:79:6c 
>>>>
>>>> 2014-10-22 10:43:37,444 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3] 
>>>> SNMPv3 header decoded: msgId=1547681362, msgMaxSize=65535, 
>>>> msgFlags=07, secModel=3
>>>> 2014-10-22 10:43:37,444 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.USM] 
>>>> getUser(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b, 
>>>> securityName=<removed>)
>>>> 2014-10-22 10:43:37,444 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.UsmTimeTable] 
>>>> CheckTime: time ok (non authoritative)
>>>> 2014-10-22 10:43:37,444 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3] 
>>>> RFC3412 §7.2.10 - Received PDU (msgID=1547681362) is a response or 
>>>> internal class message, but cached information for the msgID could 
>>>> not be found
>>>> 2014-10-22 10:43:37,444 WARN 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.MessageDispatcherImpl] 
>>>> statusInfo=noError, status=-1409
>>>>
>>>>
>>>> 2014-10-22 10:44:03,439 DEBUG [Timer-30]-[org.snmp4j.security.USM] 
>>>> getUser(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b, 
>>>> securityName=<removed>)
>>>> 2014-10-22 10:44:03,439 DEBUG [Timer-30]-[org.snmp4j.security.USM] 
>>>> RFC3414 §3.1.4.a Outgoing message needs to be encrypted
>>>> 2014-10-22 10:44:03,439 DEBUG 
>>>> [Timer-30]-[org.snmp4j.security.PrivDES] Preparing decrypt_params.
>>>> 2014-10-22 10:44:03,439 DEBUG 
>>>> [Timer-30]-[org.snmp4j.security.PrivDES] Preparing iv for encryption.
>>>> 2014-10-22 10:44:03,440 DEBUG 
>>>> [Timer-30]-[org.snmp4j.security.PrivDES] Using padding.
>>>> 2014-10-22 10:44:03,440 DEBUG 
>>>> [Timer-30]-[org.snmp4j.security.PrivDES] Encryption finished.
>>>> 2014-10-22 10:44:03,440 DEBUG [Timer-30]-[org.snmp4j.mp.MPv3] 
>>>> Adding cache entry: 
>>>> StateReference[msgID=1547681650,pduHandle=PduHandle[1912160112],securityEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,securityModel=org.snmp4j.security.USM at 897ec81,securityName=<removed>,securityLevel=3,contextEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,contextName=nt,retryMsgIDs=null]
>>>> 2014-10-22 10:44:03,440 DEBUG [Timer-30]-[org.snmp4j.Snmp] Running 
>>>> pending sync request with handle PduHandle[1912160112] and retry 
>>>> count left 0
>>>> 2014-10-22 10:44:03,440 DEBUG 
>>>> [Timer-30]-[org.snmp4j.transport.DefaultUdpTransportMapping] 
>>>> Sending message to 124.150.10.52/161 with length 143:
>>>> 30:81:8c:02:01:03:30:11:02:04:5c:3f:bf:72:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f:9b: 
>>>>
>>>> 02:01:05:02:03:00:f6:77:04:09:31:40:72:61:6c:65:69:67:68:04:0c:a7:9a:4e:c6:14:1f:c3:59:6d:7c:50:8d:04:08:00:00:00:05:1a:64:0b: 
>>>>
>>>> 9d:04:38:da:3a:05:13:5d:80:b0:9c:fe:41:c5:92:f7:72:9c:9b:fc:01:d0:c3:85:40:b5:93:e2:ab:9c:a0:ca:3a:0a:cf:91:1b:73:e6:89:a3:11: 
>>>>
>>>> f2:59:c1:00:3c:ee:a7:07:aa:7d:9f:1e:7f:2e:07:7f:04
>>>> 2014-10-22 10:44:04,109 DEBUG [Timer-30]-[org.snmp4j.Snmp] 
>>>> PendingRequest canceled key=null, pdu=null, target=null, 
>>>> transport=null, listener=null
>>>> 2014-10-22 10:44:04,129 DEBUG [Timer-30]-[org.snmp4j.Snmp] 
>>>> PendingRequest canceled key=null, pdu=null, target=null, 
>>>> transport=null, listener=null
>>>> 2014-10-22 10:44:04,129 DEBUG [Timer-30]-[org.snmp4j.Snmp] 
>>>> PendingRequest canceled key=null, pdu=null, target=null, 
>>>> transport=null, listener=null
>>>>
>>>>
>>>> 2014-10-22 10:44:41,466 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.transport.DefaultUdpTransportMapping] 
>>>> Received message from /124.150.10.52/161 with length 152:
>>>> 30:82:00:94:02:01:03:30:11:02:04:5c:3f:be:b2:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f: 
>>>>
>>>> 9b:02:01:05:02:03:00:f6:9d:04:09:31:40:72:61:6c:65:69:67:68:04:0c:1a:d1:a1:f1:89:56:c8:cb:60:58:05:2d:04:08:54:4a:54:4a:54:4a: 
>>>>
>>>> 54:4a:04:40:b1:0c:a4:55:56:4c:13:c6:73:a2:f5:70:19:68:83:e8:03:ac:b0:40:0c:c8:8d:c0:fc:7c:b6:b6:0f:81:d8:97:35:ea:d6:7e:6b:47: 
>>>>
>>>> 0f:f7:c2:7d:f7:62:1e:14:c0:81:58:9e:22:ba:1d:4a:f2:cd:74:b6:18:b9:1e:93:9f:23 
>>>>
>>>> 2014-10-22 10:44:41,466 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3] 
>>>> SNMPv3 header decoded: msgId=1547681458, msgMaxSize=65535, 
>>>> msgFlags=07, secModel=3
>>>> 2014-10-22 10:44:41,466 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.USM] 
>>>> getUser(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b, 
>>>> securityName=<removed>)
>>>> 2014-10-22 10:44:41,466 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.UsmTimeTable] 
>>>> CheckTime: time ok (non authoritative)
>>>> 2014-10-22 10:44:41,466 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3] 
>>>> RFC3412 §7.2.10 - Received PDU (msgID=1547681458) is a response or 
>>>> internal class message, but cached information for the msgID could 
>>>> not be found
>>>> 2014-10-22 10:44:41,467 WARN 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.MessageDispatcherImpl] 
>>>> statusInfo=noError, status=-1409
>>>>
>>>>
>>>> 2014-10-22 10:45:45,539 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.transport.DefaultUdpTransportMapping] 
>>>> Received message from /124.150.10.52/161 with length 152:
>>>> 30:82:00:94:02:01:03:30:11:02:04:5c:3f:bf:72:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f: 
>>>>
>>>> 9b:02:01:05:02:03:00:f6:dd:04:09:31:40:72:61:6c:65:69:67:68:04:0c:63:92:b3:32:2d:43:06:89:11:1b:bf:41:04:08:79:1a:79:1a:79:1a: 
>>>>
>>>> 79:1a:04:40:3e:29:b5:5b:77:09:68:17:85:b1:9a:dc:22:75:cd:6d:f6:d7:3d:6f:a0:0c:6e:85:c5:9c:78:92:97:52:08:e6:83:59:5c:e9:70:44: 
>>>>
>>>> 3c:5a:94:6e:f2:3a:f3:19:3e:fa:ba:6c:4f:3e:ed:32:2f:02:de:ce:c3:95:46:0c:79:c4 
>>>>
>>>> 2014-10-22 10:45:45,539 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3] 
>>>> SNMPv3 header decoded: msgId=1547681650, msgMaxSize=65535, 
>>>> msgFlags=07, secModel=3
>>>> 2014-10-22 10:45:45,540 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.USM] 
>>>> getUser(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b, 
>>>> securityName=<removed>)
>>>> 2014-10-22 10:45:45,540 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.UsmTimeTable] 
>>>> CheckTime: time ok (non authoritative)
>>>> 2014-10-22 10:45:45,540 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3] 
>>>> Removed cache entry: 
>>>> StateReference[msgID=1547681650,pduHandle=PduHandle[1912160112],securityEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,securityModel=org.snmp4j.security.USM at 897ec81,securityName=<removed>,securityLevel=3,contextEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,contextName=nt,retryMsgIDs=null]
>>>> 2014-10-22 10:45:45,540 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3] 
>>>> RFC3412 §7.2.10 - Received PDU (msgID=1547681650) is a response or 
>>>> an internal class message. PduHandle.transactionID = 1912160112
>>>> 2014-10-22 10:45:45,540 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3] 
>>>> MPv3 finished
>>>> 2014-10-22 10:45:45,540 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.Snmp] 
>>>> Looking up pending request with handle PduHandle[1912160112]
>>>> 2014-10-22 10:45:45,540 DEBUG 
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.Snmp] Removed pending request 
>>>> with handle: PduHandle[1912160112]
>>>> ----------------------------------
>>>>
>>>>
>>>> As you can see, the first response is rejected with:
>>>> 2014-10-22 10:43:37,444 DEBUG 
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3] 
>>>> RFC3412 §7.2.10 - Received PDU (msgID=1547681362) is a response or 
>>>> internal class message, but cached information for the msgID could 
>>>> not be found
>>>>
>>>> However, I don't see a logging line telling that the entry for 
>>>> PDUHandle 1912160112 has been removed from the MPv3 cache before 
>>>> that line.   I would expect:
>>>> - either the logging line from Cache.popEntry ("Removed cache 
>>>> entry: ...")
>>>> - or the error line from Snmp.run, where Cache.deleteEntry is 
>>>> eventually called ("Failed to send SNMP message to ...")
>>>>
>>>> The entry is only explicitly removed at the very end, after the 
>>>> response to the second retransmission.
>>>>
>>>> The only thing I can see (but I could be completely wrong about 
>>>> this) is that the MPv3 Cache uses a WeakHashMap, so if there would 
>>>> be no other references to those PduHandles, the GC could remove 
>>>> this.   The longer the timeout is, the bigger the chance that a GC 
>>>> will step in and remove the entry.   But I don't have a view on the 
>>>> overall design: maybe there are references elsewhere.
>>>>
>>>> Could you investigate this issue?
>>>>
>>>> Thanks,
>>>> Peter.
>>>>
>>>
>>
>>
>
>

-- 
---
AGENT++
Maximilian-Kolbe-Str. 10
73257 Koengen, Germany
https://agentpp.com
Phone: +49 7024 8688230
Fax:   +49 7024 8688231




More information about the SNMP4J mailing list