[SNMP4J] TableUtils - GetBulk responses - Possible race condition
Damian
damianstaro at hotmail.com
Fri Mar 14 22:43:22 CET 2008
Hi Frank,
I tried with version 1.9 and I am still able to replicate it.
I added the following instrumentation to TableUtils.java:
***************
*** 305,306 ****
--- 305,307 ----
private LinkedList rowCache = new LinkedList();
+ private Vector requestIds = new Vector();
protected Vector lastReceived;
***************
*** 399,400 ****
--- 400,402 ----
PDU response = event.getResponse();
+ requestIds.add(response.getRequestID());
int cols = request.size();
***************
*** 466,467 ****
--- 468,470 ----
TableEvent.STATUS_WRONG_ORDER));
+ logger.error("OIDs in wrong order. Request IDs processed
by TableUtils.TableRequest.onResponse: " + requestIds);
return;
This is the log message I got that makes me believe that the stack is
bubbling up the same request id twice:
2008-03-14 17:10:24,931 ERROR [DefaultUDPTransportMapping_0.0.0.0/0]
Log4jLogAdapter.error - OIDs in wrong order. Request IDs processed by
TableUtils.TableRequest.onResponse: [690986669, 690986671, 690986675,
690986679, 690986683, 690986687, 690986690, 690986694, 690986698, 690986702,
690986706, 690986710, 690986714, 690986718, 690986722, 690986726, 690986730,
690986734, 690986738, 690986742, 690986746, 690986749, 690986754, 690986757,
690986760, 690986764, 690986764]
In this message, we can see request-id 690986764 (29:2f:9f:0c) being
processed twice and I think that is the probable cause of the
STATUS_WRONG_ORDER.
I am attaching all the logs generated by snmp4j 1.9 in my test. I am also
attaching the modified version of TableUtils.java that I used.
These are the log messages that I believe pertain to the problem:
2008-03-14 17:10:24,192 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
Log4jLogAdapter.debug - Running pending async request with handle
PduHandle[690986764] and retry count left 3
2008-03-14 17:10:24,193 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
Log4jLogAdapter.debug - Sending message to 172.17.1.45/161 with length 101:
30:63:02:01:01:04:06:70:75:62:6c:69:63:a5:56:02:04:29:2f:9f:0c (cont) ...
2008-03-14 17:10:24,393 DEBUG [Timer-0] Log4jLogAdapter.debug - Running
pending async request with handle PduHandle[690986764] and retry count left
2
2008-03-14 17:10:24,394 DEBUG [Timer-0] Log4jLogAdapter.debug - Sending
message to 172.17.1.45/161 with length 101:
30:63:02:01:01:04:06:70:75:62:6c:69:63:a5:56:02:04:29:2f:9f:0c (cont) ...
2008-03-14 17:10:24,791 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
Log4jLogAdapter.debug - Received message from /172.17.1.45/161 with length
1250:
30:82:04:de:02:01:01:04:06:70:75:62:6c:69:63:a2:82:04:cf:02:04:29:2f:9f:0c
(cont) ...
2008-03-14 17:10:24,791 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
Log4jLogAdapter.debug - Looking up pending request with handle
PduHandle[690986764]
2008-03-14 17:10:24,793 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
Log4jLogAdapter.debug - Cancelling pending request with handle
PduHandle[690986764]
2008-03-14 17:10:24,794 DEBUG [Timer-0] Log4jLogAdapter.debug - Running
pending async request with handle PduHandle[690986764] and retry count left
1
2008-03-14 17:10:24,794 DEBUG [Timer-0] Log4jLogAdapter.debug - Sending
message to 172.17.1.45/161 with length 101:
30:63:02:01:01:04:06:70:75:62:6c:69:63:a5:56:02:04:29:2f:9f:0c (cont) ...
2008-03-14 17:10:24,928 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
Log4jLogAdapter.debug - Received message from /172.17.1.45/161 with length
1250:
30:82:04:de:02:01:01:04:06:70:75:62:6c:69:63:a2:82:04:cf:02:04:29:2f:9f:0c
(cont) ...
2008-03-14 17:10:24,930 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
Log4jLogAdapter.debug - Looking up pending request with handle
PduHandle[690986764]
2008-03-14 17:10:24,930 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
Log4jLogAdapter.debug - Cancelling pending request with handle null
2008-03-14 17:10:24,931 ERROR [DefaultUDPTransportMapping_0.0.0.0/0]
Log4jLogAdapter.error - OIDs in wrong order. Request IDs processed by
TableUtils.TableRequest.onResponse: [690986669, 690986671, 690986675,
690986679, 690986683, 690986687, 690986690, 690986694, 690986698, 690986702,
690986706, 690986710, 690986714, 690986718, 690986722, 690986726, 690986730,
690986734, 690986738, 690986742, 690986746, 690986749, 690986754, 690986757,
690986760, 690986764, 690986764]
Any help with this issue is truly appreciated.
Cheers,
Damian
--------------------------------------------------
From: "Frank Fock" <fock at agentpp.com>
Sent: Friday, March 14, 2008 2:52 PM
To: "Damian" <damianstaro at hotmail.com>
Cc: <snmp4j at agentpp.org>
Subject: Re: [SNMP4J] TableUtils - GetBulk responses - Possible race
condition
> Hi Damian,
>
> Please try to reproduce the issue with the latest version 1.9.
> There were a couple of changes since 1.8.1 which may
> have fixed the issue already.
>
> Best regards,
> Frank
>
> Damian wrote:
>> First of all, congrats for a great suite of products. I do mostly Java
>> SNMP managers and snmp4j has been great so far, I have been using it for
>> almost 4 years now. I also bought AgentPro and used it to generate
>> simulator code that helps us accelerate our testing process and also
>> easily reproduce conditions we experience with SNMP agents in the field.
>>
>> I suspect I might have found a race condition in the processing of
>> GetBulk responses within TableUtils. I am using version 1.8.1.
>>
>> I am using timeout values of 200ms, 400ms, 800ms and 1600ms (3 retries).
>> It seems that for this particular table the Switch is taking around 600ms
>> to reply.
>>
>> In this scenario, most of the time I get good results but once in a while
>> I get a TableEvent.STATUS_WRONG_ORDER error.
>>
>> I instrumented the TableUtils.onResponse method and I saw it process what
>> seemed to be the same PDU (same request ID) twice, and that seemed to
>> trigger the STATUS_WRONG_ORDER.
>>
>> I am attaching the SNMP4J logs I obtained as well as a Wireshark network
>> capture. The request ID that was processed twice was 1314028685.
>>
>> One of the things that puzzled me was that SNMP4J kept sending requests
>> with that request ID even after the response was received:
>>
>> 2008-03-10 15:28:49,433 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
>> Log4jLogAdapter.debug - Running pending async request with handle
>> PduHandle[1314028685] and retry count left 3
>> 2008-03-10 15:28:49,634 DEBUG [Timer-0] Log4jLogAdapter.debug - Running
>> pending async request with handle PduHandle[1314028685] and retry count
>> left 2
>> 2008-03-10 15:28:50,032 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
>> Log4jLogAdapter.debug - Looking up pending request with handle
>> PduHandle[1314028685]
>> 2008-03-10 15:28:50,032 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
>> Log4jLogAdapter.debug - Cancelling pending request with handle
>> PduHandle[1314028685]
>> 2008-03-10 15:28:50,034 DEBUG [Timer-0] Log4jLogAdapter.debug - Running
>> pending async request with handle PduHandle[1314028685] and retry count
>> left 1
>> 2008-03-10 15:28:50,163 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
>> Log4jLogAdapter.debug - Looking up pending request with handle
>> PduHandle[1314028685]
>> 2008-03-10 15:28:50,632 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
>> Log4jLogAdapter.debug - Looking up pending request with handle
>> PduHandle[1314028685]
>> 2008-03-10 15:28:50,834 DEBUG [Timer-0] Log4jLogAdapter.debug - Running
>> pending async request with handle PduHandle[1314028685] and retry count
>> left 0
>> 2008-03-10 15:28:51,623 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
>> Log4jLogAdapter.debug - Looking up pending request with handle
>> PduHandle[1314028685]
>> 2008-03-10 15:28:52,435 DEBUG [Timer-0] Log4jLogAdapter.debug - Request
>> timed out: 1314028685
>>
>> The following messages might be related:
>> 2008-03-10 15:28:50,163 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
>> Log4jLogAdapter.debug - Cancelling pending request with handle null
>> 2008-03-10 15:28:50,633 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
>> Log4jLogAdapter.debug - Cancelling pending request with handle null
>> 2008-03-10 15:28:51,624 DEBUG [DefaultUDPTransportMapping_0.0.0.0/0]
>> Log4jLogAdapter.debug - Cancelling pending request with handle null
>> 2008-03-10 15:28:52,436 DEBUG [Timer-0] Log4jLogAdapter.debug -
>> Cancelling pending request with handle null
>>
>> Thanks and please let me know if there's any additional information that
>> would help in this case.
>>
>> Damian
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> SNMP4J mailing list
>> SNMP4J at agentpp.org
>> http://lists.agentpp.org/mailman/listinfo/snmp4j
>
> --
> AGENT++
> http://www.agentpp.com
> http://www.mibexplorer.com
> http://www.mibdesigner.com
>
More information about the SNMP4J
mailing list