[SNMP4J] TableUtils - GetBulk responses - Possible race condition

Damian damianstaro at hotmail.com
Fri Mar 14 22:43:22 CET 2008


Hi Frank,

I tried with version 1.9 and I am still able to replicate it.

I added the following instrumentation to TableUtils.java:

***************
*** 305,306 ****
--- 305,307 ----
      private LinkedList rowCache = new LinkedList();
+     private Vector requestIds = new Vector();
      protected Vector lastReceived;
***************
*** 399,400 ****
--- 400,402 ----
          PDU response = event.getResponse();
+         requestIds.add(response.getRequestID());
          int cols = request.size();
***************
*** 466,467 ****
--- 468,470 ----
                                                   TableEvent.STATUS_WRONG_ORDER));
+                 logger.error("OIDs in wrong order. Request IDs processed 
by TableUtils.TableRequest.onResponse: " + requestIds);
                  return;

This is the log message I got that makes me believe that the stack is 
bubbling up the same request id twice:

2008-03-14 17:10:24,931 ERROR [DefaultUDPTransportMapping_0.0.0.0/0] 
Log4jLogAdapter.error - OIDs in wrong order. Request IDs processed by 
TableUtils.TableRequest.onResponse: [690986669, 690986671, 690986675, 
690986679, 690986683, 690986687, 690986690, 690986694, 690986698, 690986702, 
690986706, 690986710, 690986714, 690986718, 690986722, 690986726, 690986730, 
690986734, 690986738, 690986742, 690986746, 690986749, 690986754, 690986757, 
690986760, 690986764, 690986764]

In this message, we can see request-id 690986764 (29:2f:9f:0c) being 
processed twice and I think that is the probable cause of the 
STATUS_WRONG_ORDER.

I am attaching all the logs generated by snmp4j 1.9 in my test. I am also 
attaching the modified version of TableUtils.java that I used.

These are the log messages that I believe pertain to the problem:

2008-03-14 17:10:24,192 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
Log4jLogAdapter.debug - Running pending async request with handle 
PduHandle[690986764] and retry count left 3
2008-03-14 17:10:24,193 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
Log4jLogAdapter.debug - Sending message to 172.17.1.45/161 with length 101: 
30:63:02:01:01:04:06:70:75:62:6c:69:63:a5:56:02:04:29:2f:9f:0c (cont) ...
2008-03-14 17:10:24,393 DEBUG [Timer-0] Log4jLogAdapter.debug - Running 
pending async request with handle PduHandle[690986764] and retry count left 
2
2008-03-14 17:10:24,394 DEBUG [Timer-0] Log4jLogAdapter.debug - Sending 
message to 172.17.1.45/161 with length 101: 
30:63:02:01:01:04:06:70:75:62:6c:69:63:a5:56:02:04:29:2f:9f:0c (cont) ...
2008-03-14 17:10:24,791 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
Log4jLogAdapter.debug - Received message from /172.17.1.45/161 with length 
1250: 
30:82:04:de:02:01:01:04:06:70:75:62:6c:69:63:a2:82:04:cf:02:04:29:2f:9f:0c 
(cont) ...
2008-03-14 17:10:24,791 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
Log4jLogAdapter.debug - Looking up pending request with handle 
PduHandle[690986764]
2008-03-14 17:10:24,793 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
Log4jLogAdapter.debug - Cancelling pending request with handle 
PduHandle[690986764]
2008-03-14 17:10:24,794 DEBUG [Timer-0] Log4jLogAdapter.debug - Running 
pending async request with handle PduHandle[690986764] and retry count left 
1
2008-03-14 17:10:24,794 DEBUG [Timer-0] Log4jLogAdapter.debug - Sending 
message to 172.17.1.45/161 with length 101: 
30:63:02:01:01:04:06:70:75:62:6c:69:63:a5:56:02:04:29:2f:9f:0c (cont) ...
2008-03-14 17:10:24,928 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
Log4jLogAdapter.debug - Received message from /172.17.1.45/161 with length 
1250: 
30:82:04:de:02:01:01:04:06:70:75:62:6c:69:63:a2:82:04:cf:02:04:29:2f:9f:0c 
(cont) ...
2008-03-14 17:10:24,930 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
Log4jLogAdapter.debug - Looking up pending request with handle 
PduHandle[690986764]
2008-03-14 17:10:24,930 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
Log4jLogAdapter.debug - Cancelling pending request with handle null
2008-03-14 17:10:24,931 ERROR [DefaultUDPTransportMapping_0.0.0.0/0] 
Log4jLogAdapter.error - OIDs in wrong order. Request IDs processed by 
TableUtils.TableRequest.onResponse: [690986669, 690986671, 690986675, 
690986679, 690986683, 690986687, 690986690, 690986694, 690986698, 690986702, 
690986706, 690986710, 690986714, 690986718, 690986722, 690986726, 690986730, 
690986734, 690986738, 690986742, 690986746, 690986749, 690986754, 690986757, 
690986760, 690986764, 690986764]

Any help with this issue is truly appreciated.

Cheers,
Damian

--------------------------------------------------
From: "Frank Fock" <fock at agentpp.com>
Sent: Friday, March 14, 2008 2:52 PM
To: "Damian" <damianstaro at hotmail.com>
Cc: <snmp4j at agentpp.org>
Subject: Re: [SNMP4J] TableUtils - GetBulk responses - Possible race 
condition

> Hi Damian,
>
> Please try to reproduce the issue with the latest version 1.9.
> There were a couple of changes since 1.8.1 which may
> have fixed the issue already.
>
> Best regards,
> Frank
>
> Damian wrote:
>> First of all, congrats for a great suite of products. I do mostly Java 
>> SNMP managers and snmp4j has been great so far, I have been using it for 
>> almost 4 years now. I also bought AgentPro and used it to generate 
>> simulator code that helps us accelerate our testing process and also 
>> easily reproduce conditions we experience with SNMP agents in the field.
>>
>> I suspect I might have found a race condition in the processing of 
>> GetBulk responses within TableUtils. I am using version 1.8.1.
>>
>> I am using timeout values of 200ms, 400ms, 800ms and 1600ms (3 retries). 
>> It seems that for this particular table the Switch is taking around 600ms 
>> to reply.
>>
>> In this scenario, most of the time I get good results but once in a while 
>> I get a TableEvent.STATUS_WRONG_ORDER error.
>>
>> I instrumented the TableUtils.onResponse method and I saw it process what 
>> seemed to be the same PDU (same request ID) twice, and that seemed to 
>> trigger the STATUS_WRONG_ORDER.
>>
>> I am attaching the SNMP4J logs I obtained as well as a Wireshark network 
>> capture. The request ID that was processed twice was 1314028685.
>>
>> One of the things that puzzled me was that SNMP4J kept sending requests 
>> with that request ID even after the response was received:
>>
>> 2008-03-10 15:28:49,433 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
>> Log4jLogAdapter.debug - Running pending async request with handle 
>> PduHandle[1314028685] and retry count left 3
>> 2008-03-10 15:28:49,634 DEBUG [Timer-0] Log4jLogAdapter.debug - Running 
>> pending async request with handle PduHandle[1314028685] and retry count 
>> left 2
>> 2008-03-10 15:28:50,032 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
>> Log4jLogAdapter.debug - Looking up pending request with handle 
>> PduHandle[1314028685]
>> 2008-03-10 15:28:50,032 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
>> Log4jLogAdapter.debug - Cancelling pending request with handle 
>> PduHandle[1314028685]
>> 2008-03-10 15:28:50,034 DEBUG [Timer-0] Log4jLogAdapter.debug - Running 
>> pending async request with handle PduHandle[1314028685] and retry count 
>> left 1
>> 2008-03-10 15:28:50,163 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
>> Log4jLogAdapter.debug - Looking up pending request with handle 
>> PduHandle[1314028685]
>> 2008-03-10 15:28:50,632 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
>> Log4jLogAdapter.debug - Looking up pending request with handle 
>> PduHandle[1314028685]
>> 2008-03-10 15:28:50,834 DEBUG [Timer-0] Log4jLogAdapter.debug - Running 
>> pending async request with handle PduHandle[1314028685] and retry count 
>> left 0
>> 2008-03-10 15:28:51,623 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
>> Log4jLogAdapter.debug - Looking up pending request with handle 
>> PduHandle[1314028685]
>> 2008-03-10 15:28:52,435 DEBUG [Timer-0] Log4jLogAdapter.debug - Request 
>> timed out: 1314028685
>>
>> The following messages might be related:
>> 2008-03-10 15:28:50,163 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
>> Log4jLogAdapter.debug - Cancelling pending request with handle null
>> 2008-03-10 15:28:50,633 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
>> Log4jLogAdapter.debug - Cancelling pending request with handle null
>> 2008-03-10 15:28:51,624 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0] 
>> Log4jLogAdapter.debug - Cancelling pending request with handle null
>> 2008-03-10 15:28:52,436 DEBUG [Timer-0] Log4jLogAdapter.debug - 
>> Cancelling pending request with handle null
>>
>> Thanks and please let me know if there's any additional information that 
>> would help in this case.
>>
>> Damian
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> SNMP4J mailing list
>> SNMP4J at agentpp.org
>> http://lists.agentpp.org/mailman/listinfo/snmp4j
>
> -- 
> AGENT++
> http://www.agentpp.com
> http://www.mibexplorer.com
> http://www.mibdesigner.com
> 


More information about the SNMP4J mailing list