[SNMP4J] IllegalStateException

Frank Fock fock at agentpp.com
Wed Aug 18 21:38:36 CEST 2004


Brian,

OK, many thanks for your feedback! The hang was caused by
the "fix" in my last posting. The following should do the trick,
however I could not test it thouroughly, but I am pretty confident
that this one really fixes the problem.

The root cause was  a misunderstanding on my site on the return
value of TimerTask.cancel() which returns false  when the task
has not been scheduled at all (I thought that it would return true
then). Because it may return false, one need to synchronize on
the "sync" object to make sure that PendingRequest.cancel()
only returns false, if the task really had been run.

Here are the relevant code changes for the PendingRequest.run()
method:

          if (pendingRequests.get(key) != null) {
            logger.debug("Running pending request with handle" +
                         key.getTransactionID());
            PendingRequest nextRetry = new PendingRequest(this);
            synchronized (sync) {
              pendingRequests.put(key, nextRetry);
              long delay =
                  timeoutModel.getRetryTimeout(target.getRetries() - 
retryCount,
                                               target.getRetries(),
                                               target.getTimeout());
              timer.schedule(nextRetry, delay);
            }
          }

Please let me know whether this fixes the bug.

Thanks in advance,
Frank

Brian_Teravskis at cargill.com wrote:

>Frank,
>
>I had my program hang twice in the last three attempts. It hung on
>different devices polled. In each case I did not receive the
>IllegalStateException that I had on the initial problem I was seeing, so
>I'm not even sure these are related. Here is a dump of the log file
>output of the last hung event (some of my code logging, and snmp4j
>logging):
>
>2004-08-18 10:01:42,624 [Thread-1] 
>    com.cargill.rap.MySnmp(67): 
>      udp:yyy.yyy.yyy.yyy:161 
>2004-08-18 10:01:42,624 [Thread-1] 
>    com.cargill.rap.MySnmp(152): 
>      SNMP walk of: 1.3.6.1.2.1.4.22.1.2
>2004-08-18 10:01:42,624 [Thread-1] 
>    org.snmp4j.transport.DefaultUdpTransportMapping(75): 
>      Sending message to yyy.yyy.yyy.yyy/161 with length 44: 
>        30:2a:02:01:00:04:06:70:75:62:6c:69:63:a1:1d:02:04:07:
>        38:22:f2:02:01:00:02:01:00:30:0f:30:0d:06:09:2b:06:01:
>        02:01:04:16:01:02:05:00 
>2004-08-18 10:01:42,624 [Thread-1] 
>    org.snmp4j.Snmp(275): 
>      New pending request with handle 121119474 
>[..]
>2004-08-18 10:01:43,640 [Thread-2131] 
>    org.snmp4j.transport.DefaultUdpTransportMapping(75): 
>      Sending message to yyy.yyy.yyy.yyy/161 with length 44:
>        30:2a:02:01:00:04:06:70:75:62:6c:69:63:a1:1d:02:04:07:
>        38:22:f2:02:01:00:02:01:00:30:0f:30:0d:06:09:2b:06:01:
>        02:01:04:16:01:02:05:00 
>2004-08-18 10:01:43,640 [Thread-2131] 
>    org.snmp4j.Snmp$PendingRequest(711): 
>      running pending request with handleorg.snmp4j.mp.PduHandle at 73822f2
>
>2004-08-18 10:01:44,640 [Thread-1] 
>    com.cargill.rap.MySnmp(181): 
>      SNMP Timeout! 
>2004-08-18 10:01:44,640 [Thread-1] 
>    com.cargill.rap.MySnmp(167): 
>      Error while trying snmp GETNEXT:  
>java.lang.Exception: SNMP timeout.
>      at com.cargill.rap.MySnmp.processWalk(MySnmp.java:182)
>      at com.cargill.rap.MySnmp.snmpWalk(MySnmp.java:160)
>      at com.cargill.rap.MACData.fetchMACsFromRouter(MACData.java:39)
>      at
>com.cargill.rap.ARPSlavePoller.startPolling(ARPSlavePoller.java:61)
>      at com.cargill.rap.ARPSlavePoller.run(ARPSlavePoller.java:40)
>      at java.lang.Thread.run(Unknown Source)
>[..]
>2004-08-18 10:01:44,905 [Thread-2132] 
>    org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread(159):
>      Received message from /yyy.yyy.yyy.yyy with length 56:
>        30:36:02:01:00:04:06:70:75:62:6c:69:63:a2:29:02:04:07:
>        38:22:f2:02:01:00:02:01:00:30:1b:30:19:06:0f:2b:06:01:
>        02:01:04:16:01:02:01:0a:81:03:06:02:04:06:00:08:21:f1:
>        40:10 
>2004-08-18 10:01:44,905 [Thread-2132] 
>    org.snmp4j.Snmp(482): 
>      Removing pending request with
>handleorg.snmp4j.mp.PduHandle at 73822f2 
>2004-08-18 10:01:44,905 [Thread-2132] 
>    org.snmp4j.Snmp(503): 
>      Received late response from yyy.yyy.yyy.yyy/161 with request ID
>121119474 
>[..]
>2004-08-18 10:01:44,921 [Thread-1] 
>    com.cargill.rap.MySnmp(67): 
>      udp:xxx.xxx.xxx.xxx:161 
>[..]
>2004-08-18 10:01:44,937 [Thread-1] 
>    com.cargill.rap.MySnmp(152): 
>      SNMP walk of: 1.3.6.1.2.1.4.22.1.2 
>2004-08-18 10:01:44,937 [Thread-1] 
>    org.snmp4j.transport.DefaultUdpTransportMapping(75): 
>      Sending message to xxx.xxx.xxx.xxx/161 with length 44:
>        30:2a:02:01:00:04:06:70:75:62:6c:69:63:a1:1d:02:04:22:
>        15:c8:0a:02:01:00:02:01:00:30:0f:30:0d:06:09:2b:06:01:
>        02:01:04:16:01:02:05:00 
>2004-08-18 10:01:44,937 [Thread-1] org.snmp4j.Snmp(275): 
>      New pending request with handle 571852810 
>[..]
>2004-08-18 10:01:45,983 [Thread-2135] 
>    org.snmp4j.transport.DefaultUdpTransportMapping(75): 
>      Sending message to xxx.xxx.xxx.xxx/161 with length 44:
>        30:2a:02:01:00:04:06:70:75:62:6c:69:63:a1:1d:02:04:22:
>        15:c8:0a:02:01:00:02:01:00:30:0f:30:0d:06:09:2b:06:01:
>        02:01:04:16:01:02:05:00 
>[..]
>2004-08-18 10:01:45,999 [Thread-2136] 
>    org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread(159):
>      Received message from /xxx.xxx.xxx.xxx with length 55:
>        30:35:02:01:00:04:06:70:75:62:6c:69:63:a2:28:02:04:22:
>        15:c8:0a:02:01:00:02:01:00:30:1a:30:18:06:0e:2b:06:01:
>        02:01:04:16:01:02:01:0a:75:53:01:04:06:00:d0:ba:e0:33:
>        12
>[..]
>2004-08-18 10:01:46,046 [Thread-2136] 
>    org.snmp4j.Snmp(503): 
>      Received late response from xxx.xxx.xxx.xxx/161 with request ID
>571852810 
>[..]
>2004-08-18 10:01:46,983 [Thread-2136] 
>    org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread(159): 
>      Received message from /xxx.xxx.xxx.xxx with length 55:
>        30:35:02:01:00:04:06:70:75:62:6c:69:63:a2:28:02:04:22:
>        15:c8:0a:02:01:00:02:01:00:30:1a:30:18:06:0e:2b:06:01:
>        02:01:04:16:01:02:01:0a:75:53:01:04:06:00:d0:ba:e0:33:
>        12 
>2004-08-18 10:01:46,983 [Thread-2136] org.snmp4j.Snmp(482): 
>      Removing pending request with handle
>org.snmp4j.mp.PduHandle at 2215c80a 
>2004-08-18 10:01:46,983 [Thread-2136] org.snmp4j.Snmp(486): 
>      Received response that cannot be matched to any outstanding
>request, 
>      address=xxx.xxx.xxx.xxx/161, requestID=571852810 
>[..]
>
>No other logs for Thread-1 after this.. It is hung waiting for a
>response.
>
>
>Things I noticed:
>
>1. In this case I have the SNMP timeout set to 1000, and it appeared to
>work in the prior SNMP request to yyy.yyy.yyy.yyy before it hung.
>2. I did receive a response to the first request too late, after the
>timeout.
>3. I sent out a request to the device that it hung on (xxx.xxx.xxx.xxx)
>4. I received a response 16ms later, but it appeared not to be processed
>as a message over 1 second later indicated the response was not in time
>(even though it was in time!)
>5. I receive another response from xxx.xxx.xxx.xxx, and then the pending
>request is removed, and the response is then not being matched to an
>outstanding request
>6. Then nothing.. Thread-1 hangs indefinately. No return to my
>snmp.sendPDU().
>
>I hope this helps nail down where the code is loosing it. I do a lot of
>WAN SNMP Polling across the world, and therefore, it is a good test bed
>for late responses and timeout issues.
>
>Let me know if I can supply any further information to assist in
>debugging this problem. In the meantime I'm planning on writing an
>asynchronous routine to watchdog the reponse, and if none received the
>event is logged and the thread will go to the next device to poll.
>
>Thanks,
>
>Brian
>
>
>
>-----Original Message-----
>From: fock at agentpp.com [mailto:fock at agentpp.com]
>Sent: Tuesday, August 17, 2004 4:16 PM
>To: Teravskis, Brian /mtks
>Cc: snmp4j at agentpp.org
>Subject: Re: [SNMP4J] IllegalStateException
>
>
>I thought a small timeout would increase the probability, but
>it could also be that the response came in just at the moment
>a retry was sent out.
>
>Frank
>
>Brian_Teravskis at cargill.com wrote:
>
>  
>
>>Hmmm, I thought 5000 for the timeout (5 seconds?) was plenty large,
>>although I have been seeing late arriving SNMP responses at that as
>>well. Some of our routers are at the far end of the world from here,
>>    
>>
>and
>  
>
>>it can take a while for sure. 
>>
>>I had it lock on me again today with no error. I upgraded to the latest
>>version of SNMP4J, though the code in this area does not look much
>>different. I will do more logging, and try your fix if I run into it
>>again.
>>
>>Thanks,
>>
>>Brian
>>
>>
>>-----Original Message-----
>>From: fock at agentpp.com [mailto:fock at agentpp.com]
>>Sent: Tuesday, August 17, 2004 3:22 PM
>>To: Teravskis, Brian /mtks
>>Cc: snmp4j at agentpp.org
>>Subject: Re: [SNMP4J] IllegalStateException
>>
>>
>>Hello Brian,
>>
>>You are probably using very small timeout values?
>>Nevertheless, the SNMP4J code can be improved as
>>follows to avoid this exception:
>>
>>         if (pendingRequests.get(key) != null) {
>>           logger.debug("running pending request with handle" + key);
>>           PendingRequest nextRetry = new PendingRequest(this);
>>           long delay =
>>
>>timeoutModel.getRetryTimeout(target.getRetries()-retryCount,
>>                                            target.getRetries(),
>>                                            target.getTimeout());
>>           timer.schedule(nextRetry, delay);
>>           // register pending request not before it is scheduled
>>           pendingRequests.put(key, nextRetry);
>>         }
>>
>>Please let me know if this fixes the problem.
>>
>>Thanks in advance,
>>Frank
>>
>>Brian_Teravskis at cargill.com wrote:
>>
>> 
>>
>>    
>>
>>>Hello,
>>>
>>>I'm using snmp4j for ARP cache gathering of network devices, and the
>>>poller I wrote generated the following error message to the console:
>>>
>>>java.lang.IllegalStateException: Task already scheduled or cancelled
>>>      at java.util.Timer.sched(Unknown Source)
>>>      at java.util.Timer.schedule(Unknown Source)
>>>      at org.snmp4j.Snmp$PendingRequest.run(Snmp.java:670)
>>>      at java.util.TimerThread.mainLoop(Unknown Source)
>>>      at java.util.TimerThread.run(Unknown Source)
>>>
>>>The origin of the problem seems to be:
>>>
>>>  public void run() {
>>>    if (retryCount > 0) {
>>>      retryCount--;
>>>      try {
>>>        send(pdu, target);
>>>        if (pendingRequests.get(key) != null) {
>>>          logger.debug("running pending request with handle" + key);
>>>          PendingRequest nextRetry = new PendingRequest(this);
>>>          pendingRequests.put(key, nextRetry);
>>>          long delay =
>>>
>>>timeoutModel.getRetryTimeout(target.getRetries()-retryCount,
>>>                                           target.getRetries(),
>>>                                           target.getTimeout());
>>>-->        timer.schedule(nextRetry, delay);
>>>        }
>>>      }
>>>
>>>It appears that this exception is not being caught and handled, and
>>>therefore the error. My thread is catching IOException and then
>>>Exception, and is not seeing this exception being thrown by the SNMP4J
>>>routines. The thread I wrote appears to hang when this exception
>>>   
>>>
>>>      
>>>
>>happens
>> 
>>
>>    
>>
>>>waiting for a reponse. This is not a repeatable event. I've run this
>>>poller several times without any issues. I'm doing the SNMP getnext
>>>   
>>>
>>>      
>>>
>>with
>> 
>>
>>    
>>
>>>a timeout of 5000 and a retry of 2.
>>>
>>>If any further information is required let me know.
>>>_______________________________________________
>>>SNMP4J mailing list
>>>SNMP4J at agentpp.org
>>>http://p15141779.pureserver.info/mailman/listinfo/snmp4j
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>
>>
>> 
>>
>>    
>>
>
>
>
>
>  
>





More information about the SNMP4J mailing list