[SNMP4J] TableUtils - GetBulk responses - Possible race condition

Damian damianstaro at hotmail.com
Fri Mar 14 15:30:00 CET 2008


First of all, congrats for a great suite of products. I do mostly Java SNMP managers and snmp4j has been great so far, I have been using it for almost 4 years now. 
I also bought AgentPro and used it to generate simulator code that helps us accelerate our testing process and also easily reproduce conditions we experience with SNMP agents in the field.

I suspect I might have found a race condition in the processing of GetBulk responses within TableUtils. 

I am using version 1.8.1.

I am using timeout values of 200ms, 400ms, 800ms and 1600ms (3 retries). It seems that for this particular table the Switch is taking around 600ms to reply.

In this scenario, most of the time I get good results but once in a while I get a TableEvent.STATUS_WRONG_ORDER error.

I instrumented the TableUtils.onResponse method and I saw it process what seemed to be the same PDU (same request ID) twice, and that seemed to trigger the STATUS_WRONG_ORDER.

I am attaching the SNMP4J logs I obtained as well as a Wireshark network capture. The request ID that was processed twice was 1314028685.

One of the things that puzzled me was that SNMP4J kept sending requests with that request ID even after the response was received:

2008-03-10 15:28:49,433 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] Log4jLogAdapter.debug - Running pending async request with handle PduHandle[1314028685] and retry count left 3
2008-03-10 15:28:49,634 DEBUG [Timer-0] Log4jLogAdapter.debug - Running pending async request with handle PduHandle[1314028685] and retry count left 2
2008-03-10 15:28:50,032 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] Log4jLogAdapter.debug - Looking up pending request with handle PduHandle[1314028685]
2008-03-10 15:28:50,032 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] Log4jLogAdapter.debug - Cancelling pending request with handle PduHandle[1314028685]
2008-03-10 15:28:50,034 DEBUG [Timer-0] Log4jLogAdapter.debug - Running pending async request with handle PduHandle[1314028685] and retry count left 1
2008-03-10 15:28:50,163 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] Log4jLogAdapter.debug - Looking up pending request with handle PduHandle[1314028685]
2008-03-10 15:28:50,632 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] Log4jLogAdapter.debug - Looking up pending request with handle PduHandle[1314028685]
2008-03-10 15:28:50,834 DEBUG [Timer-0] Log4jLogAdapter.debug - Running pending async request with handle PduHandle[1314028685] and retry count left 0
2008-03-10 15:28:51,623 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] Log4jLogAdapter.debug - Looking up pending request with handle PduHandle[1314028685]
2008-03-10 15:28:52,435 DEBUG [Timer-0] Log4jLogAdapter.debug - Request timed out: 1314028685

The following messages might be related:
2008-03-10 15:28:50,163 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] Log4jLogAdapter.debug - Cancelling pending request with handle null
2008-03-10 15:28:50,633 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] Log4jLogAdapter.debug - Cancelling pending request with handle null
2008-03-10 15:28:51,624 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] Log4jLogAdapter.debug - Cancelling pending request with handle null
2008-03-10 15:28:52,436 DEBUG [Timer-0] Log4jLogAdapter.debug - Cancelling pending request with handle null

Thanks and please let me know if there's any additional information that would help in this case.

Damian


More information about the SNMP4J mailing list