[SNMP4J] IllegalStateException

Frank Fock fock at agentpp.com
Thu Aug 19 14:17:52 CEST 2004


Hi Brian,

In the meantime I have written a JUnit test for the timeout
problem. It turned out that besides the below fix, the
processPdu needs to be fixed too as follows:

  public void processPdu(CommandResponderEvent event) {
    PduHandle handle = event.getPduHandle();
    PDU pdu = event.getPdu();
    if (pdu.getType() == PDU.RESPONSE) {
      event.setProcessed(true);
      PendingRequest request;
      boolean intime = true;
      synchronized (sync) {
        if (logger.isDebugEnabled()) {
          logger.debug("Removing pending request with handle" +
                       handle.getTransactionID());
        }
        request = (PendingRequest) pendingRequests.remove(handle);
        if (request == null) {
          logger.warn("Received response that cannot be matched to any "+
                      "outstanding request, address="+
                      event.getPeerAddress() +
                      ", requestID="+pdu.getRequestID());
        }
        else {
          intime = request.cancel();
        }
      }
      if (request == null) {
        logger.warn("Received response that cannot be matched to any "+
                    "outstanding request, address="+
                    event.getPeerAddress() +
                    ", requestID="+pdu.getRequestID());
      }
      else {
        if (intime) {
          // return response
          request.listener.onResponse(new ResponseEvent(this,
                                                        request.pdu,
                                                        pdu,
                                                        
request.userObject));
        }
        else {
          // silently drop late response
          if (logger.isInfoEnabled()) {
            logger.info("Received late response from " +
                        event.getPeerAddress() +
                        " with request ID " + pdu.getRequestID());
          }
        }
      }
    }
    else if (pdu.getType() == PDU.REPORT) {
      event.setProcessed(true);
      processReport(handle, event);
    }
    else {
      if (logger.isDebugEnabled()) {
        logger.debug("Fire process PDU event: "+event.toString());
      }
      fireProcessPdu(event);
    }
  }

With this fix a JUnit test with several 1000 requests with random timeouts
+/-200 ms around the response time of the agent where processed without
a hang.

A new SNMP4J version will be available tomorrow that includes these
fixes.

Best regards,
Frank

Frank Fock wrote:

> Brian,
>
> OK, many thanks for your feedback! The hang was caused by
> the "fix" in my last posting. The following should do the trick,
> however I could not test it thouroughly, but I am pretty confident
> that this one really fixes the problem.
>
> The root cause was  a misunderstanding on my site on the return
> value of TimerTask.cancel() which returns false  when the task
> has not been scheduled at all (I thought that it would return true
> then). Because it may return false, one need to synchronize on
> the "sync" object to make sure that PendingRequest.cancel()
> only returns false, if the task really had been run.
>
> Here are the relevant code changes for the PendingRequest.run()
> method:
>
>          if (pendingRequests.get(key) != null) {
>            logger.debug("Running pending request with handle" +
>                         key.getTransactionID());
>            PendingRequest nextRetry = new PendingRequest(this);
>            synchronized (sync) {
>              pendingRequests.put(key, nextRetry);
>              long delay =
>                  timeoutModel.getRetryTimeout(target.getRetries() - 
> retryCount,
>                                               target.getRetries(),
>                                               target.getTimeout());
>              timer.schedule(nextRetry, delay);
>            }
>          }
>
> Please let me know whether this fixes the bug.
>
> Thanks in advance,
> Frank
>
> Brian_Teravskis at cargill.com wrote:
>
>> Frank,
>>
>> I had my program hang twice in the last three attempts. It hung on
>> different devices polled. In each case I did not receive the
>> IllegalStateException that I had on the initial problem I was seeing, so
>> I'm not even sure these are related. Here is a dump of the log file
>> output of the last hung event (some of my code logging, and snmp4j
>> logging):
>>
>> 2004-08-18 10:01:42,624 [Thread-1]    com.cargill.rap.MySnmp(67): 
>>      udp:yyy.yyy.yyy.yyy:161 2004-08-18 10:01:42,624 [Thread-1]    
>> com.cargill.rap.MySnmp(152):      SNMP walk of: 1.3.6.1.2.1.4.22.1.2
>> 2004-08-18 10:01:42,624 [Thread-1]    
>> org.snmp4j.transport.DefaultUdpTransportMapping(75):      Sending 
>> message to yyy.yyy.yyy.yyy/161 with length 44:        
>> 30:2a:02:01:00:04:06:70:75:62:6c:69:63:a1:1d:02:04:07:
>>        38:22:f2:02:01:00:02:01:00:30:0f:30:0d:06:09:2b:06:01:
>>        02:01:04:16:01:02:05:00 2004-08-18 10:01:42,624 [Thread-1]    
>> org.snmp4j.Snmp(275):      New pending request with handle 121119474 
>> [..]
>> 2004-08-18 10:01:43,640 [Thread-2131]    
>> org.snmp4j.transport.DefaultUdpTransportMapping(75):      Sending 
>> message to yyy.yyy.yyy.yyy/161 with length 44:
>>        30:2a:02:01:00:04:06:70:75:62:6c:69:63:a1:1d:02:04:07:
>>        38:22:f2:02:01:00:02:01:00:30:0f:30:0d:06:09:2b:06:01:
>>        02:01:04:16:01:02:05:00 2004-08-18 10:01:43,640 [Thread-2131] 
>>    org.snmp4j.Snmp$PendingRequest(711):      running pending request 
>> with handleorg.snmp4j.mp.PduHandle at 73822f2
>>
>> 2004-08-18 10:01:44,640 [Thread-1]    com.cargill.rap.MySnmp(181): 
>>      SNMP Timeout! 2004-08-18 10:01:44,640 [Thread-1]    
>> com.cargill.rap.MySnmp(167):      Error while trying snmp GETNEXT:  
>> java.lang.Exception: SNMP timeout.
>>      at com.cargill.rap.MySnmp.processWalk(MySnmp.java:182)
>>      at com.cargill.rap.MySnmp.snmpWalk(MySnmp.java:160)
>>      at com.cargill.rap.MACData.fetchMACsFromRouter(MACData.java:39)
>>      at
>> com.cargill.rap.ARPSlavePoller.startPolling(ARPSlavePoller.java:61)
>>      at com.cargill.rap.ARPSlavePoller.run(ARPSlavePoller.java:40)
>>      at java.lang.Thread.run(Unknown Source)
>> [..]
>> 2004-08-18 10:01:44,905 [Thread-2132]    
>> org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread(159):
>>      Received message from /yyy.yyy.yyy.yyy with length 56:
>>        30:36:02:01:00:04:06:70:75:62:6c:69:63:a2:29:02:04:07:
>>        38:22:f2:02:01:00:02:01:00:30:1b:30:19:06:0f:2b:06:01:
>>        02:01:04:16:01:02:01:0a:81:03:06:02:04:06:00:08:21:f1:
>>        40:10 2004-08-18 10:01:44,905 [Thread-2132]    
>> org.snmp4j.Snmp(482):      Removing pending request with
>> handleorg.snmp4j.mp.PduHandle at 73822f2 2004-08-18 10:01:44,905 
>> [Thread-2132]    org.snmp4j.Snmp(503):      Received late response 
>> from yyy.yyy.yyy.yyy/161 with request ID
>> 121119474 [..]
>> 2004-08-18 10:01:44,921 [Thread-1]    com.cargill.rap.MySnmp(67): 
>>      udp:xxx.xxx.xxx.xxx:161 [..]
>> 2004-08-18 10:01:44,937 [Thread-1]    com.cargill.rap.MySnmp(152): 
>>      SNMP walk of: 1.3.6.1.2.1.4.22.1.2 2004-08-18 10:01:44,937 
>> [Thread-1]    org.snmp4j.transport.DefaultUdpTransportMapping(75): 
>>      Sending message to xxx.xxx.xxx.xxx/161 with length 44:
>>        30:2a:02:01:00:04:06:70:75:62:6c:69:63:a1:1d:02:04:22:
>>        15:c8:0a:02:01:00:02:01:00:30:0f:30:0d:06:09:2b:06:01:
>>        02:01:04:16:01:02:05:00 2004-08-18 10:01:44,937 [Thread-1] 
>> org.snmp4j.Snmp(275):      New pending request with handle 571852810 
>> [..]
>> 2004-08-18 10:01:45,983 [Thread-2135]    
>> org.snmp4j.transport.DefaultUdpTransportMapping(75):      Sending 
>> message to xxx.xxx.xxx.xxx/161 with length 44:
>>        30:2a:02:01:00:04:06:70:75:62:6c:69:63:a1:1d:02:04:22:
>>        15:c8:0a:02:01:00:02:01:00:30:0f:30:0d:06:09:2b:06:01:
>>        02:01:04:16:01:02:05:00 [..]
>> 2004-08-18 10:01:45,999 [Thread-2136]    
>> org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread(159):
>>      Received message from /xxx.xxx.xxx.xxx with length 55:
>>        30:35:02:01:00:04:06:70:75:62:6c:69:63:a2:28:02:04:22:
>>        15:c8:0a:02:01:00:02:01:00:30:1a:30:18:06:0e:2b:06:01:
>>        02:01:04:16:01:02:01:0a:75:53:01:04:06:00:d0:ba:e0:33:
>>        12
>> [..]
>> 2004-08-18 10:01:46,046 [Thread-2136]    org.snmp4j.Snmp(503):      
>> Received late response from xxx.xxx.xxx.xxx/161 with request ID
>> 571852810 [..]
>> 2004-08-18 10:01:46,983 [Thread-2136]    
>> org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread(159): 
>>      Received message from /xxx.xxx.xxx.xxx with length 55:
>>        30:35:02:01:00:04:06:70:75:62:6c:69:63:a2:28:02:04:22:
>>        15:c8:0a:02:01:00:02:01:00:30:1a:30:18:06:0e:2b:06:01:
>>        02:01:04:16:01:02:01:0a:75:53:01:04:06:00:d0:ba:e0:33:
>>        12 2004-08-18 10:01:46,983 [Thread-2136] org.snmp4j.Snmp(482): 
>>      Removing pending request with handle
>> org.snmp4j.mp.PduHandle at 2215c80a 2004-08-18 10:01:46,983 
>> [Thread-2136] org.snmp4j.Snmp(486):      Received response that 
>> cannot be matched to any outstanding
>> request,      address=xxx.xxx.xxx.xxx/161, requestID=571852810 [..]
>>
>> No other logs for Thread-1 after this.. It is hung waiting for a
>> response.
>>
>>
>> Things I noticed:
>>
>> 1. In this case I have the SNMP timeout set to 1000, and it appeared to
>> work in the prior SNMP request to yyy.yyy.yyy.yyy before it hung.
>> 2. I did receive a response to the first request too late, after the
>> timeout.
>> 3. I sent out a request to the device that it hung on (xxx.xxx.xxx.xxx)
>> 4. I received a response 16ms later, but it appeared not to be processed
>> as a message over 1 second later indicated the response was not in time
>> (even though it was in time!)
>> 5. I receive another response from xxx.xxx.xxx.xxx, and then the pending
>> request is removed, and the response is then not being matched to an
>> outstanding request
>> 6. Then nothing.. Thread-1 hangs indefinately. No return to my
>> snmp.sendPDU().
>>
>> I hope this helps nail down where the code is loosing it. I do a lot of
>> WAN SNMP Polling across the world, and therefore, it is a good test bed
>> for late responses and timeout issues.
>>
>> Let me know if I can supply any further information to assist in
>> debugging this problem. In the meantime I'm planning on writing an
>> asynchronous routine to watchdog the reponse, and if none received the
>> event is logged and the thread will go to the next device to poll.
>>
>> Thanks,
>>
>> Brian
>>
>>
>>
>> -----Original Message-----
>> From: fock at agentpp.com [mailto:fock at agentpp.com]
>> Sent: Tuesday, August 17, 2004 4:16 PM
>> To: Teravskis, Brian /mtks
>> Cc: snmp4j at agentpp.org
>> Subject: Re: [SNMP4J] IllegalStateException
>>
>>
>> I thought a small timeout would increase the probability, but
>> it could also be that the response came in just at the moment
>> a retry was sent out.
>>
>> Frank
>>
>> Brian_Teravskis at cargill.com wrote:
>>
>>  
>>
>>> Hmmm, I thought 5000 for the timeout (5 seconds?) was plenty large,
>>> although I have been seeing late arriving SNMP responses at that as
>>> well. Some of our routers are at the far end of the world from here,
>>>   
>>
>> and
>>  
>>
>>> it can take a while for sure.
>>> I had it lock on me again today with no error. I upgraded to the latest
>>> version of SNMP4J, though the code in this area does not look much
>>> different. I will do more logging, and try your fix if I run into it
>>> again.
>>>
>>> Thanks,
>>>
>>> Brian
>>>
>>>
>>> -----Original Message-----
>>> From: fock at agentpp.com [mailto:fock at agentpp.com]
>>> Sent: Tuesday, August 17, 2004 3:22 PM
>>> To: Teravskis, Brian /mtks
>>> Cc: snmp4j at agentpp.org
>>> Subject: Re: [SNMP4J] IllegalStateException
>>>
>>>
>>> Hello Brian,
>>>
>>> You are probably using very small timeout values?
>>> Nevertheless, the SNMP4J code can be improved as
>>> follows to avoid this exception:
>>>
>>>         if (pendingRequests.get(key) != null) {
>>>           logger.debug("running pending request with handle" + key);
>>>           PendingRequest nextRetry = new PendingRequest(this);
>>>           long delay =
>>>
>>> timeoutModel.getRetryTimeout(target.getRetries()-retryCount,
>>>                                            target.getRetries(),
>>>                                            target.getTimeout());
>>>           timer.schedule(nextRetry, delay);
>>>           // register pending request not before it is scheduled
>>>           pendingRequests.put(key, nextRetry);
>>>         }
>>>
>>> Please let me know if this fixes the problem.
>>>
>>> Thanks in advance,
>>> Frank
>>>
>>> Brian_Teravskis at cargill.com wrote:
>>>
>>>
>>>
>>>   
>>>
>>>> Hello,
>>>>
>>>> I'm using snmp4j for ARP cache gathering of network devices, and the
>>>> poller I wrote generated the following error message to the console:
>>>>
>>>> java.lang.IllegalStateException: Task already scheduled or cancelled
>>>>      at java.util.Timer.sched(Unknown Source)
>>>>      at java.util.Timer.schedule(Unknown Source)
>>>>      at org.snmp4j.Snmp$PendingRequest.run(Snmp.java:670)
>>>>      at java.util.TimerThread.mainLoop(Unknown Source)
>>>>      at java.util.TimerThread.run(Unknown Source)
>>>>
>>>> The origin of the problem seems to be:
>>>>
>>>>  public void run() {
>>>>    if (retryCount > 0) {
>>>>      retryCount--;
>>>>      try {
>>>>        send(pdu, target);
>>>>        if (pendingRequests.get(key) != null) {
>>>>          logger.debug("running pending request with handle" + key);
>>>>          PendingRequest nextRetry = new PendingRequest(this);
>>>>          pendingRequests.put(key, nextRetry);
>>>>          long delay =
>>>>
>>>> timeoutModel.getRetryTimeout(target.getRetries()-retryCount,
>>>>                                           target.getRetries(),
>>>>                                           target.getTimeout());
>>>> -->        timer.schedule(nextRetry, delay);
>>>>        }
>>>>      }
>>>>
>>>> It appears that this exception is not being caught and handled, and
>>>> therefore the error. My thread is catching IOException and then
>>>> Exception, and is not seeing this exception being thrown by the SNMP4J
>>>> routines. The thread I wrote appears to hang when this exception
>>>>  
>>>>     
>>>
>>> happens
>>>
>>>
>>>   
>>>
>>>> waiting for a reponse. This is not a repeatable event. I've run this
>>>> poller several times without any issues. I'm doing the SNMP getnext
>>>>  
>>>>     
>>>
>>> with
>>>
>>>
>>>   
>>>
>>>> a timeout of 5000 and a retry of 2.
>>>>
>>>> If any further information is required let me know.
>>>> _______________________________________________
>>>> SNMP4J mailing list
>>>> SNMP4J at agentpp.org
>>>> http://p15141779.pureserver.info/mailman/listinfo/snmp4j
>>>>
>>>>
>>>>
>>>>  
>>>>     
>>>
>>>
>>>
>>>
>>>
>>>   
>>
>>
>>
>>
>>
>>  
>>
>
>
> _______________________________________________
> SNMP4J mailing list
> SNMP4J at agentpp.org
> http://p15141779.pureserver.info/mailman/listinfo/snmp4j
>





More information about the SNMP4J mailing list