[SNMP4J] IllegalStateException
Frank Fock
fock at agentpp.com
Thu Aug 19 14:17:52 CEST 2004
Hi Brian,
In the meantime I have written a JUnit test for the timeout
problem. It turned out that besides the below fix, the
processPdu needs to be fixed too as follows:
public void processPdu(CommandResponderEvent event) {
PduHandle handle = event.getPduHandle();
PDU pdu = event.getPdu();
if (pdu.getType() == PDU.RESPONSE) {
event.setProcessed(true);
PendingRequest request;
boolean intime = true;
synchronized (sync) {
if (logger.isDebugEnabled()) {
logger.debug("Removing pending request with handle" +
handle.getTransactionID());
}
request = (PendingRequest) pendingRequests.remove(handle);
if (request == null) {
logger.warn("Received response that cannot be matched to any "+
"outstanding request, address="+
event.getPeerAddress() +
", requestID="+pdu.getRequestID());
}
else {
intime = request.cancel();
}
}
if (request == null) {
logger.warn("Received response that cannot be matched to any "+
"outstanding request, address="+
event.getPeerAddress() +
", requestID="+pdu.getRequestID());
}
else {
if (intime) {
// return response
request.listener.onResponse(new ResponseEvent(this,
request.pdu,
pdu,
request.userObject));
}
else {
// silently drop late response
if (logger.isInfoEnabled()) {
logger.info("Received late response from " +
event.getPeerAddress() +
" with request ID " + pdu.getRequestID());
}
}
}
}
else if (pdu.getType() == PDU.REPORT) {
event.setProcessed(true);
processReport(handle, event);
}
else {
if (logger.isDebugEnabled()) {
logger.debug("Fire process PDU event: "+event.toString());
}
fireProcessPdu(event);
}
}
With this fix a JUnit test with several 1000 requests with random timeouts
+/-200 ms around the response time of the agent where processed without
a hang.
A new SNMP4J version will be available tomorrow that includes these
fixes.
Best regards,
Frank
Frank Fock wrote:
> Brian,
>
> OK, many thanks for your feedback! The hang was caused by
> the "fix" in my last posting. The following should do the trick,
> however I could not test it thouroughly, but I am pretty confident
> that this one really fixes the problem.
>
> The root cause was a misunderstanding on my site on the return
> value of TimerTask.cancel() which returns false when the task
> has not been scheduled at all (I thought that it would return true
> then). Because it may return false, one need to synchronize on
> the "sync" object to make sure that PendingRequest.cancel()
> only returns false, if the task really had been run.
>
> Here are the relevant code changes for the PendingRequest.run()
> method:
>
> if (pendingRequests.get(key) != null) {
> logger.debug("Running pending request with handle" +
> key.getTransactionID());
> PendingRequest nextRetry = new PendingRequest(this);
> synchronized (sync) {
> pendingRequests.put(key, nextRetry);
> long delay =
> timeoutModel.getRetryTimeout(target.getRetries() -
> retryCount,
> target.getRetries(),
> target.getTimeout());
> timer.schedule(nextRetry, delay);
> }
> }
>
> Please let me know whether this fixes the bug.
>
> Thanks in advance,
> Frank
>
> Brian_Teravskis at cargill.com wrote:
>
>> Frank,
>>
>> I had my program hang twice in the last three attempts. It hung on
>> different devices polled. In each case I did not receive the
>> IllegalStateException that I had on the initial problem I was seeing, so
>> I'm not even sure these are related. Here is a dump of the log file
>> output of the last hung event (some of my code logging, and snmp4j
>> logging):
>>
>> 2004-08-18 10:01:42,624 [Thread-1] com.cargill.rap.MySnmp(67):
>> udp:yyy.yyy.yyy.yyy:161 2004-08-18 10:01:42,624 [Thread-1]
>> com.cargill.rap.MySnmp(152): SNMP walk of: 1.3.6.1.2.1.4.22.1.2
>> 2004-08-18 10:01:42,624 [Thread-1]
>> org.snmp4j.transport.DefaultUdpTransportMapping(75): Sending
>> message to yyy.yyy.yyy.yyy/161 with length 44:
>> 30:2a:02:01:00:04:06:70:75:62:6c:69:63:a1:1d:02:04:07:
>> 38:22:f2:02:01:00:02:01:00:30:0f:30:0d:06:09:2b:06:01:
>> 02:01:04:16:01:02:05:00 2004-08-18 10:01:42,624 [Thread-1]
>> org.snmp4j.Snmp(275): New pending request with handle 121119474
>> [..]
>> 2004-08-18 10:01:43,640 [Thread-2131]
>> org.snmp4j.transport.DefaultUdpTransportMapping(75): Sending
>> message to yyy.yyy.yyy.yyy/161 with length 44:
>> 30:2a:02:01:00:04:06:70:75:62:6c:69:63:a1:1d:02:04:07:
>> 38:22:f2:02:01:00:02:01:00:30:0f:30:0d:06:09:2b:06:01:
>> 02:01:04:16:01:02:05:00 2004-08-18 10:01:43,640 [Thread-2131]
>> org.snmp4j.Snmp$PendingRequest(711): running pending request
>> with handleorg.snmp4j.mp.PduHandle at 73822f2
>>
>> 2004-08-18 10:01:44,640 [Thread-1] com.cargill.rap.MySnmp(181):
>> SNMP Timeout! 2004-08-18 10:01:44,640 [Thread-1]
>> com.cargill.rap.MySnmp(167): Error while trying snmp GETNEXT:
>> java.lang.Exception: SNMP timeout.
>> at com.cargill.rap.MySnmp.processWalk(MySnmp.java:182)
>> at com.cargill.rap.MySnmp.snmpWalk(MySnmp.java:160)
>> at com.cargill.rap.MACData.fetchMACsFromRouter(MACData.java:39)
>> at
>> com.cargill.rap.ARPSlavePoller.startPolling(ARPSlavePoller.java:61)
>> at com.cargill.rap.ARPSlavePoller.run(ARPSlavePoller.java:40)
>> at java.lang.Thread.run(Unknown Source)
>> [..]
>> 2004-08-18 10:01:44,905 [Thread-2132]
>> org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread(159):
>> Received message from /yyy.yyy.yyy.yyy with length 56:
>> 30:36:02:01:00:04:06:70:75:62:6c:69:63:a2:29:02:04:07:
>> 38:22:f2:02:01:00:02:01:00:30:1b:30:19:06:0f:2b:06:01:
>> 02:01:04:16:01:02:01:0a:81:03:06:02:04:06:00:08:21:f1:
>> 40:10 2004-08-18 10:01:44,905 [Thread-2132]
>> org.snmp4j.Snmp(482): Removing pending request with
>> handleorg.snmp4j.mp.PduHandle at 73822f2 2004-08-18 10:01:44,905
>> [Thread-2132] org.snmp4j.Snmp(503): Received late response
>> from yyy.yyy.yyy.yyy/161 with request ID
>> 121119474 [..]
>> 2004-08-18 10:01:44,921 [Thread-1] com.cargill.rap.MySnmp(67):
>> udp:xxx.xxx.xxx.xxx:161 [..]
>> 2004-08-18 10:01:44,937 [Thread-1] com.cargill.rap.MySnmp(152):
>> SNMP walk of: 1.3.6.1.2.1.4.22.1.2 2004-08-18 10:01:44,937
>> [Thread-1] org.snmp4j.transport.DefaultUdpTransportMapping(75):
>> Sending message to xxx.xxx.xxx.xxx/161 with length 44:
>> 30:2a:02:01:00:04:06:70:75:62:6c:69:63:a1:1d:02:04:22:
>> 15:c8:0a:02:01:00:02:01:00:30:0f:30:0d:06:09:2b:06:01:
>> 02:01:04:16:01:02:05:00 2004-08-18 10:01:44,937 [Thread-1]
>> org.snmp4j.Snmp(275): New pending request with handle 571852810
>> [..]
>> 2004-08-18 10:01:45,983 [Thread-2135]
>> org.snmp4j.transport.DefaultUdpTransportMapping(75): Sending
>> message to xxx.xxx.xxx.xxx/161 with length 44:
>> 30:2a:02:01:00:04:06:70:75:62:6c:69:63:a1:1d:02:04:22:
>> 15:c8:0a:02:01:00:02:01:00:30:0f:30:0d:06:09:2b:06:01:
>> 02:01:04:16:01:02:05:00 [..]
>> 2004-08-18 10:01:45,999 [Thread-2136]
>> org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread(159):
>> Received message from /xxx.xxx.xxx.xxx with length 55:
>> 30:35:02:01:00:04:06:70:75:62:6c:69:63:a2:28:02:04:22:
>> 15:c8:0a:02:01:00:02:01:00:30:1a:30:18:06:0e:2b:06:01:
>> 02:01:04:16:01:02:01:0a:75:53:01:04:06:00:d0:ba:e0:33:
>> 12
>> [..]
>> 2004-08-18 10:01:46,046 [Thread-2136] org.snmp4j.Snmp(503):
>> Received late response from xxx.xxx.xxx.xxx/161 with request ID
>> 571852810 [..]
>> 2004-08-18 10:01:46,983 [Thread-2136]
>> org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread(159):
>> Received message from /xxx.xxx.xxx.xxx with length 55:
>> 30:35:02:01:00:04:06:70:75:62:6c:69:63:a2:28:02:04:22:
>> 15:c8:0a:02:01:00:02:01:00:30:1a:30:18:06:0e:2b:06:01:
>> 02:01:04:16:01:02:01:0a:75:53:01:04:06:00:d0:ba:e0:33:
>> 12 2004-08-18 10:01:46,983 [Thread-2136] org.snmp4j.Snmp(482):
>> Removing pending request with handle
>> org.snmp4j.mp.PduHandle at 2215c80a 2004-08-18 10:01:46,983
>> [Thread-2136] org.snmp4j.Snmp(486): Received response that
>> cannot be matched to any outstanding
>> request, address=xxx.xxx.xxx.xxx/161, requestID=571852810 [..]
>>
>> No other logs for Thread-1 after this.. It is hung waiting for a
>> response.
>>
>>
>> Things I noticed:
>>
>> 1. In this case I have the SNMP timeout set to 1000, and it appeared to
>> work in the prior SNMP request to yyy.yyy.yyy.yyy before it hung.
>> 2. I did receive a response to the first request too late, after the
>> timeout.
>> 3. I sent out a request to the device that it hung on (xxx.xxx.xxx.xxx)
>> 4. I received a response 16ms later, but it appeared not to be processed
>> as a message over 1 second later indicated the response was not in time
>> (even though it was in time!)
>> 5. I receive another response from xxx.xxx.xxx.xxx, and then the pending
>> request is removed, and the response is then not being matched to an
>> outstanding request
>> 6. Then nothing.. Thread-1 hangs indefinately. No return to my
>> snmp.sendPDU().
>>
>> I hope this helps nail down where the code is loosing it. I do a lot of
>> WAN SNMP Polling across the world, and therefore, it is a good test bed
>> for late responses and timeout issues.
>>
>> Let me know if I can supply any further information to assist in
>> debugging this problem. In the meantime I'm planning on writing an
>> asynchronous routine to watchdog the reponse, and if none received the
>> event is logged and the thread will go to the next device to poll.
>>
>> Thanks,
>>
>> Brian
>>
>>
>>
>> -----Original Message-----
>> From: fock at agentpp.com [mailto:fock at agentpp.com]
>> Sent: Tuesday, August 17, 2004 4:16 PM
>> To: Teravskis, Brian /mtks
>> Cc: snmp4j at agentpp.org
>> Subject: Re: [SNMP4J] IllegalStateException
>>
>>
>> I thought a small timeout would increase the probability, but
>> it could also be that the response came in just at the moment
>> a retry was sent out.
>>
>> Frank
>>
>> Brian_Teravskis at cargill.com wrote:
>>
>>
>>
>>> Hmmm, I thought 5000 for the timeout (5 seconds?) was plenty large,
>>> although I have been seeing late arriving SNMP responses at that as
>>> well. Some of our routers are at the far end of the world from here,
>>>
>>
>> and
>>
>>
>>> it can take a while for sure.
>>> I had it lock on me again today with no error. I upgraded to the latest
>>> version of SNMP4J, though the code in this area does not look much
>>> different. I will do more logging, and try your fix if I run into it
>>> again.
>>>
>>> Thanks,
>>>
>>> Brian
>>>
>>>
>>> -----Original Message-----
>>> From: fock at agentpp.com [mailto:fock at agentpp.com]
>>> Sent: Tuesday, August 17, 2004 3:22 PM
>>> To: Teravskis, Brian /mtks
>>> Cc: snmp4j at agentpp.org
>>> Subject: Re: [SNMP4J] IllegalStateException
>>>
>>>
>>> Hello Brian,
>>>
>>> You are probably using very small timeout values?
>>> Nevertheless, the SNMP4J code can be improved as
>>> follows to avoid this exception:
>>>
>>> if (pendingRequests.get(key) != null) {
>>> logger.debug("running pending request with handle" + key);
>>> PendingRequest nextRetry = new PendingRequest(this);
>>> long delay =
>>>
>>> timeoutModel.getRetryTimeout(target.getRetries()-retryCount,
>>> target.getRetries(),
>>> target.getTimeout());
>>> timer.schedule(nextRetry, delay);
>>> // register pending request not before it is scheduled
>>> pendingRequests.put(key, nextRetry);
>>> }
>>>
>>> Please let me know if this fixes the problem.
>>>
>>> Thanks in advance,
>>> Frank
>>>
>>> Brian_Teravskis at cargill.com wrote:
>>>
>>>
>>>
>>>
>>>
>>>> Hello,
>>>>
>>>> I'm using snmp4j for ARP cache gathering of network devices, and the
>>>> poller I wrote generated the following error message to the console:
>>>>
>>>> java.lang.IllegalStateException: Task already scheduled or cancelled
>>>> at java.util.Timer.sched(Unknown Source)
>>>> at java.util.Timer.schedule(Unknown Source)
>>>> at org.snmp4j.Snmp$PendingRequest.run(Snmp.java:670)
>>>> at java.util.TimerThread.mainLoop(Unknown Source)
>>>> at java.util.TimerThread.run(Unknown Source)
>>>>
>>>> The origin of the problem seems to be:
>>>>
>>>> public void run() {
>>>> if (retryCount > 0) {
>>>> retryCount--;
>>>> try {
>>>> send(pdu, target);
>>>> if (pendingRequests.get(key) != null) {
>>>> logger.debug("running pending request with handle" + key);
>>>> PendingRequest nextRetry = new PendingRequest(this);
>>>> pendingRequests.put(key, nextRetry);
>>>> long delay =
>>>>
>>>> timeoutModel.getRetryTimeout(target.getRetries()-retryCount,
>>>> target.getRetries(),
>>>> target.getTimeout());
>>>> --> timer.schedule(nextRetry, delay);
>>>> }
>>>> }
>>>>
>>>> It appears that this exception is not being caught and handled, and
>>>> therefore the error. My thread is catching IOException and then
>>>> Exception, and is not seeing this exception being thrown by the SNMP4J
>>>> routines. The thread I wrote appears to hang when this exception
>>>>
>>>>
>>>
>>> happens
>>>
>>>
>>>
>>>
>>>> waiting for a reponse. This is not a repeatable event. I've run this
>>>> poller several times without any issues. I'm doing the SNMP getnext
>>>>
>>>>
>>>
>>> with
>>>
>>>
>>>
>>>
>>>> a timeout of 5000 and a retry of 2.
>>>>
>>>> If any further information is required let me know.
>>>> _______________________________________________
>>>> SNMP4J mailing list
>>>> SNMP4J at agentpp.org
>>>> http://p15141779.pureserver.info/mailman/listinfo/snmp4j
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>>
>>
>
>
> _______________________________________________
> SNMP4J mailing list
> SNMP4J at agentpp.org
> http://p15141779.pureserver.info/mailman/listinfo/snmp4j
>
More information about the SNMP4J
mailing list