[SNMP4J] SNMPv3 problem with unhandled responses
Frank Fock
fock at agentpp.com
Mon Oct 27 08:35:50 CET 2014
Hi Peter,
Operarting system and JDK version (including whether it is a Oracle or
3rd party JRE) would be helpful
too.
I will analyze it. Meanwhile, you could try the following fix in Snmp.java:
protected boolean resendRequest(PendingRequest request, PDU response) {
if (request.useNextPDU()) {
request.responseReceived = false;
synchronized (pendingRequests) {
pendingRequests.remove(request.key);
PduHandle holdKeyUntilResendDone = request.key;
request.key = null;
handleInternalResponse(response, request.pdu,
request.target.getAddress());
try {
sendMessage(request.pdu, request.target, request.transport,
request);
}
catch (IOException e) {
logger.error("IOException while resending request after RFC
5343 context engine ID discovery: " +
e.getMessage(), e);
}
// now the previous retry can be released
if (logger.isDebugEnabled()) {
logger.debug("Releasing PDU handle "+holdKeyUntilResendDone);
}
holdKeyUntilResendDone = null;
}
return true;
}
return false;
}
From my analysis, I thought this improvement would not have any effect
on your issue,
but I am not 100% sure. I will have to exeactly analyze the
request/response sequences
to be sure.
Best regards,
Frank
Am 27.10.2014 08:01, schrieb Peter Verthez:
> Hi Frank,
>
> We have now tried with both SNMP4J 2.3.0 and 2.3.1 and we're facing
> the same problem.
>
> Can you investigate this? Is there any more information we could
> provide?
>
> Best regards,
> Peter.
>
>
> On 24/10/2014 8:17, Peter Verthez wrote:
>> OK, thanks, we'll try that (we are on the latest version in our
>> latest release, but the customer is using an older release of our
>> product).
>>
>> Best regards,
>> Peter.
>>
>>
>> On 24/10/2014 2:16, Frank Fock wrote:
>>> Hi Peter,
>>>
>>> Please update to the latest version. There have been several fixes
>>> about race conditions
>>> in the Snmp class regarding retries and the PduHandleCallback. The
>>> current implementation
>>> seems to be OK regarding handling of the Message IDs and the MPv3
>>> Cache WeakHashMap.
>>>
>>> Best regards,
>>> Frank
>>>
>>> Am 23.10.2014 12:34, schrieb Peter Verthez:
>>>> Hi Frank,
>>>>
>>>> One of our customers has reported a problem with SNMPv3. The
>>>> version of our application that they use is using SNMP4J 2.0.1. For
>>>> the particular SNMP request that has the problem, the agent is
>>>> known to respond very slowly, so the timeout model for that
>>>> particular request has been set to:
>>>> - a first timeout after 30 seconds, then a retransmission
>>>> - a second timeout after 60 seconds (so 90 seconds after the
>>>> original request), then a retransmission
>>>> - a third timeout after 120 seconds (210 seconds after the original
>>>> request)
>>>>
>>>> Now, we observe the following behavior (you can also see that in
>>>> the logs that I'm pasting below):
>>>>
>>>> 2014-10-22 10:42:33,428 => Original request
>>>> 2014-10-22 10:43:03,430 => 1st retransmission
>>>> 2014-10-22 10:43:37,444 => Response to original request (rejected)
>>>> 2014-10-22 10:44:03,440 => 2nd retransmission
>>>> 2014-10-22 10:44:41,466 => Response to 1st retransmission (rejected)
>>>> 2014-10-22 10:45:45,539 => Response to 2nd retransmission (accepted)
>>>>
>>>> So the problem that we see is that the first response (the response
>>>> to the original request) is not accepted for some reason. This
>>>> looks similar to the problem SFJ-43, but that was fixed in SNMP4J
>>>> 2.0.0 according to the change log. We are in the process of
>>>> trying to reproduce it on SNMP4J 2.3.0, but I see no relevant
>>>> change in the change log, so I guess there will be no difference.
>>>>
>>>> The logs are the following (with some sensitive information removed):
>>>>
>>>> ------------------------------------------------
>>>> 2014-10-22 10:42:33,426 DEBUG
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.UsmUserTable] Adding
>>>> user <removed> = UsmUser
>>>> [secName=<removed>,authProtocol=1.3.6.1.6.3.10.1.1.3,authPassphrase=<removed>,privProtocol=1.3.6.1.6.3.10.1.2.2,privPassphrase=<removed>,localizationEngineID=null]
>>>> 2014-10-22 10:42:33,427 DEBUG
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.mp.MPv3] Context engine ID of
>>>> scoped PDU is empty! Setting it to authoritative engine ID:
>>>> 80:00:02:7d:03:00:0e:86:1e:5f:9b
>>>> 2014-10-22 10:42:33,427 DEBUG
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.USM] getUser
>>>> (engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b, securityName=<removed>)
>>>> 2014-10-22 10:42:33,427 DEBUG
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.USM] RFC3414 §3.1.4.a
>>>> Outgoing message needs to be encrypted
>>>> 2014-10-22 10:42:33,427 DEBUG
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.PrivDES] Preparing
>>>> decrypt_params.
>>>> 2014-10-22 10:42:33,427 DEBUG
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.PrivDES] Preparing iv
>>>> for encryption.
>>>> 2014-10-22 10:42:33,427 DEBUG
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.PrivDES] Using padding.
>>>> 2014-10-22 10:42:33,427 DEBUG
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.security.PrivDES] Encryption
>>>> finished.
>>>> 2014-10-22 10:42:33,427 DEBUG
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.mp.MPv3] Adding cache entry:
>>>> StateReference[msgID=1547681362,pduHandle=PduHandle[1912160112],securityEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,securityModel=org.snmp4j.security.USM at 897ec81,securityName=<removed>,securityLevel=3,contextEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,contextName=nt,retryMsgIDs=null]
>>>>
>>>> 2014-10-22 10:42:33,428 DEBUG
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.Snmp] Running pending sync
>>>> request with handle PduHandle[1912160112] and retry count left 2
>>>> 2014-10-22 10:42:33,428 DEBUG
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.transport.DefaultUdpTransportMapping]
>>>> Sending message to 124.150.10.52/161 with length 143:
>>>> 30:81:8c:02:01:03:30:11:02:04:5c:3f:be:52:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:
>>>> 00:0e:86:1e:5f:9b:02:01:05:02:03:00:f6:1c:04:09:31:40:72:61:6c:65:69:67:68:04:0c:aa:82:4f:0d:79:70:ab:03:54:1d:8d:07:04:08:
>>>>
>>>> 00:00:00:05:1a:64:0a:89:04:38:56:b3:24:74:26:e8:bc:d0:be:5c:f4:6e:59:d6:ac:2e:35:2d:01:fd:91:3f:be:0e:75:31:aa:96:4d:9b:36:ab:27:e7:65:e6:e3:18:55:
>>>>
>>>> a6:1a:fc:be:cc:22:3c:92:47:6a:06:3b:e7:30:1a:16:2c
>>>>
>>>>
>>>> 2014-10-22 10:43:03,430 DEBUG [Timer-30]-[org.snmp4j.mp.MPv3]
>>>> Adding cache entry:
>>>> StateReference[msgID=1547681458,pduHandle=PduHandle[1912160112],securityEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,securityModel=org.snmp4j.security.USM at 897ec81,securityName=<removed>,securityLevel=3,contextEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,contextName=nt,retryMsgIDs=null]
>>>> 2014-10-22 10:43:03,430 DEBUG [Timer-30]-[org.snmp4j.Snmp] Running
>>>> pending sync request with handle PduHandle[1912160112] and retry
>>>> count left 1
>>>> 2014-10-22 10:43:03,430 DEBUG
>>>> [Timer-30]-[org.snmp4j.transport.DefaultUdpTransportMapping]
>>>> Sending message to 124.150.10.52/161 with length 143:
>>>> 30:81:8c:02:01:03:30:11:02:04:5c:3f:be:b2:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f:9b:
>>>> 02:01:05:02:03:00:f6:3a:04:09:31:40:72:61:6c:65:69:67:68:04:0c:01:bd:0d:b8:20:96:05:00:f8:52:26:ac:04:08:00:00:00:05:1a:64:0a:
>>>>
>>>> e6:04:38:76:37:c0:dd:e9:97:38:33:11:2d:3e:76:31:a8:67:2e:bb:dd:8b:cd:cc:5a:33:28:ec:53:6c:a2:ab:da:f7:5b:88:9d:98:59:f3:e9:ae:
>>>>
>>>> 2b:d5:12:16:2f:dd:4d:33:5d:c7:6e:19:14:65:f4:2c:60
>>>>
>>>>
>>>> 2014-10-22 10:43:37,444 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.transport.DefaultUdpTransportMapping]
>>>> Received message from /124.150.10.52/161 with length 152:
>>>> 30:82:00:94:02:01:03:30:11:02:04:5c:3f:be:52:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f:
>>>>
>>>> 9b:02:01:05:02:03:00:f6:5d:04:09:31:40:72:61:6c:65:69:67:68:04:0c:e0:81:9d:7f:48:f7:12:2b:72:28:33:0c:04:08:08:35:08:35:08:35:
>>>>
>>>> 08:35:04:40:16:f0:c4:fd:8a:3a:e6:01:89:de:17:22:d3:75:95:69:9d:81:d6:f8:44:47:97:91:bf:d9:8c:cc:f1:fc:04:5e:47:84:1b:04:ae:37:
>>>>
>>>> b0:3a:58:ec:2b:6f:c9:8f:5d:08:0c:e3:33:f1:7f:5b:ca:95:cf:c0:43:72:c6:9c:79:6c
>>>>
>>>> 2014-10-22 10:43:37,444 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
>>>> SNMPv3 header decoded: msgId=1547681362, msgMaxSize=65535,
>>>> msgFlags=07, secModel=3
>>>> 2014-10-22 10:43:37,444 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.USM]
>>>> getUser(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,
>>>> securityName=<removed>)
>>>> 2014-10-22 10:43:37,444 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.UsmTimeTable]
>>>> CheckTime: time ok (non authoritative)
>>>> 2014-10-22 10:43:37,444 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
>>>> RFC3412 §7.2.10 - Received PDU (msgID=1547681362) is a response or
>>>> internal class message, but cached information for the msgID could
>>>> not be found
>>>> 2014-10-22 10:43:37,444 WARN
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.MessageDispatcherImpl]
>>>> statusInfo=noError, status=-1409
>>>>
>>>>
>>>> 2014-10-22 10:44:03,439 DEBUG [Timer-30]-[org.snmp4j.security.USM]
>>>> getUser(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,
>>>> securityName=<removed>)
>>>> 2014-10-22 10:44:03,439 DEBUG [Timer-30]-[org.snmp4j.security.USM]
>>>> RFC3414 §3.1.4.a Outgoing message needs to be encrypted
>>>> 2014-10-22 10:44:03,439 DEBUG
>>>> [Timer-30]-[org.snmp4j.security.PrivDES] Preparing decrypt_params.
>>>> 2014-10-22 10:44:03,439 DEBUG
>>>> [Timer-30]-[org.snmp4j.security.PrivDES] Preparing iv for encryption.
>>>> 2014-10-22 10:44:03,440 DEBUG
>>>> [Timer-30]-[org.snmp4j.security.PrivDES] Using padding.
>>>> 2014-10-22 10:44:03,440 DEBUG
>>>> [Timer-30]-[org.snmp4j.security.PrivDES] Encryption finished.
>>>> 2014-10-22 10:44:03,440 DEBUG [Timer-30]-[org.snmp4j.mp.MPv3]
>>>> Adding cache entry:
>>>> StateReference[msgID=1547681650,pduHandle=PduHandle[1912160112],securityEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,securityModel=org.snmp4j.security.USM at 897ec81,securityName=<removed>,securityLevel=3,contextEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,contextName=nt,retryMsgIDs=null]
>>>> 2014-10-22 10:44:03,440 DEBUG [Timer-30]-[org.snmp4j.Snmp] Running
>>>> pending sync request with handle PduHandle[1912160112] and retry
>>>> count left 0
>>>> 2014-10-22 10:44:03,440 DEBUG
>>>> [Timer-30]-[org.snmp4j.transport.DefaultUdpTransportMapping]
>>>> Sending message to 124.150.10.52/161 with length 143:
>>>> 30:81:8c:02:01:03:30:11:02:04:5c:3f:bf:72:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f:9b:
>>>>
>>>> 02:01:05:02:03:00:f6:77:04:09:31:40:72:61:6c:65:69:67:68:04:0c:a7:9a:4e:c6:14:1f:c3:59:6d:7c:50:8d:04:08:00:00:00:05:1a:64:0b:
>>>>
>>>> 9d:04:38:da:3a:05:13:5d:80:b0:9c:fe:41:c5:92:f7:72:9c:9b:fc:01:d0:c3:85:40:b5:93:e2:ab:9c:a0:ca:3a:0a:cf:91:1b:73:e6:89:a3:11:
>>>>
>>>> f2:59:c1:00:3c:ee:a7:07:aa:7d:9f:1e:7f:2e:07:7f:04
>>>> 2014-10-22 10:44:04,109 DEBUG [Timer-30]-[org.snmp4j.Snmp]
>>>> PendingRequest canceled key=null, pdu=null, target=null,
>>>> transport=null, listener=null
>>>> 2014-10-22 10:44:04,129 DEBUG [Timer-30]-[org.snmp4j.Snmp]
>>>> PendingRequest canceled key=null, pdu=null, target=null,
>>>> transport=null, listener=null
>>>> 2014-10-22 10:44:04,129 DEBUG [Timer-30]-[org.snmp4j.Snmp]
>>>> PendingRequest canceled key=null, pdu=null, target=null,
>>>> transport=null, listener=null
>>>>
>>>>
>>>> 2014-10-22 10:44:41,466 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.transport.DefaultUdpTransportMapping]
>>>> Received message from /124.150.10.52/161 with length 152:
>>>> 30:82:00:94:02:01:03:30:11:02:04:5c:3f:be:b2:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f:
>>>>
>>>> 9b:02:01:05:02:03:00:f6:9d:04:09:31:40:72:61:6c:65:69:67:68:04:0c:1a:d1:a1:f1:89:56:c8:cb:60:58:05:2d:04:08:54:4a:54:4a:54:4a:
>>>>
>>>> 54:4a:04:40:b1:0c:a4:55:56:4c:13:c6:73:a2:f5:70:19:68:83:e8:03:ac:b0:40:0c:c8:8d:c0:fc:7c:b6:b6:0f:81:d8:97:35:ea:d6:7e:6b:47:
>>>>
>>>> 0f:f7:c2:7d:f7:62:1e:14:c0:81:58:9e:22:ba:1d:4a:f2:cd:74:b6:18:b9:1e:93:9f:23
>>>>
>>>> 2014-10-22 10:44:41,466 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
>>>> SNMPv3 header decoded: msgId=1547681458, msgMaxSize=65535,
>>>> msgFlags=07, secModel=3
>>>> 2014-10-22 10:44:41,466 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.USM]
>>>> getUser(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,
>>>> securityName=<removed>)
>>>> 2014-10-22 10:44:41,466 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.UsmTimeTable]
>>>> CheckTime: time ok (non authoritative)
>>>> 2014-10-22 10:44:41,466 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
>>>> RFC3412 §7.2.10 - Received PDU (msgID=1547681458) is a response or
>>>> internal class message, but cached information for the msgID could
>>>> not be found
>>>> 2014-10-22 10:44:41,467 WARN
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.MessageDispatcherImpl]
>>>> statusInfo=noError, status=-1409
>>>>
>>>>
>>>> 2014-10-22 10:45:45,539 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.transport.DefaultUdpTransportMapping]
>>>> Received message from /124.150.10.52/161 with length 152:
>>>> 30:82:00:94:02:01:03:30:11:02:04:5c:3f:bf:72:02:03:00:ff:ff:04:01:07:02:01:03:04:3a:30:38:04:0b:80:00:02:7d:03:00:0e:86:1e:5f:
>>>>
>>>> 9b:02:01:05:02:03:00:f6:dd:04:09:31:40:72:61:6c:65:69:67:68:04:0c:63:92:b3:32:2d:43:06:89:11:1b:bf:41:04:08:79:1a:79:1a:79:1a:
>>>>
>>>> 79:1a:04:40:3e:29:b5:5b:77:09:68:17:85:b1:9a:dc:22:75:cd:6d:f6:d7:3d:6f:a0:0c:6e:85:c5:9c:78:92:97:52:08:e6:83:59:5c:e9:70:44:
>>>>
>>>> 3c:5a:94:6e:f2:3a:f3:19:3e:fa:ba:6c:4f:3e:ed:32:2f:02:de:ce:c3:95:46:0c:79:c4
>>>>
>>>> 2014-10-22 10:45:45,539 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
>>>> SNMPv3 header decoded: msgId=1547681650, msgMaxSize=65535,
>>>> msgFlags=07, secModel=3
>>>> 2014-10-22 10:45:45,540 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.USM]
>>>> getUser(engineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,
>>>> securityName=<removed>)
>>>> 2014-10-22 10:45:45,540 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.security.UsmTimeTable]
>>>> CheckTime: time ok (non authoritative)
>>>> 2014-10-22 10:45:45,540 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
>>>> Removed cache entry:
>>>> StateReference[msgID=1547681650,pduHandle=PduHandle[1912160112],securityEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,securityModel=org.snmp4j.security.USM at 897ec81,securityName=<removed>,securityLevel=3,contextEngineID=80:00:02:7d:03:00:0e:86:1e:5f:9b,contextName=nt,retryMsgIDs=null]
>>>> 2014-10-22 10:45:45,540 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
>>>> RFC3412 §7.2.10 - Received PDU (msgID=1547681650) is a response or
>>>> an internal class message. PduHandle.transactionID = 1912160112
>>>> 2014-10-22 10:45:45,540 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
>>>> MPv3 finished
>>>> 2014-10-22 10:45:45,540 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.Snmp]
>>>> Looking up pending request with handle PduHandle[1912160112]
>>>> 2014-10-22 10:45:45,540 DEBUG
>>>> [http-0.0.0.0-8080-30]-[org.snmp4j.Snmp] Removed pending request
>>>> with handle: PduHandle[1912160112]
>>>> ----------------------------------
>>>>
>>>>
>>>> As you can see, the first response is rejected with:
>>>> 2014-10-22 10:43:37,444 DEBUG
>>>> [DefaultUDPTransportMapping_124.150.10.20/0]-[org.snmp4j.mp.MPv3]
>>>> RFC3412 §7.2.10 - Received PDU (msgID=1547681362) is a response or
>>>> internal class message, but cached information for the msgID could
>>>> not be found
>>>>
>>>> However, I don't see a logging line telling that the entry for
>>>> PDUHandle 1912160112 has been removed from the MPv3 cache before
>>>> that line. I would expect:
>>>> - either the logging line from Cache.popEntry ("Removed cache
>>>> entry: ...")
>>>> - or the error line from Snmp.run, where Cache.deleteEntry is
>>>> eventually called ("Failed to send SNMP message to ...")
>>>>
>>>> The entry is only explicitly removed at the very end, after the
>>>> response to the second retransmission.
>>>>
>>>> The only thing I can see (but I could be completely wrong about
>>>> this) is that the MPv3 Cache uses a WeakHashMap, so if there would
>>>> be no other references to those PduHandles, the GC could remove
>>>> this. The longer the timeout is, the bigger the chance that a GC
>>>> will step in and remove the entry. But I don't have a view on the
>>>> overall design: maybe there are references elsewhere.
>>>>
>>>> Could you investigate this issue?
>>>>
>>>> Thanks,
>>>> Peter.
>>>>
>>>
>>
>>
>
>
--
---
AGENT++
Maximilian-Kolbe-Str. 10
73257 Koengen, Germany
https://agentpp.com
Phone: +49 7024 8688230
Fax: +49 7024 8688231
More information about the SNMP4J
mailing list