[SNMP4J] Timeouts in version 2.5.7+

Frank Fock fock at agentpp.com
Fri Apr 6 15:40:08 CEST 2018


Is your timeout value big enough?
I do not see any differences in the stacktraces, do I miss something?


> Am 06.04.2018 um 14:15 schrieb Steffen Brüntjen <Steffen.Bruentjen at macmon.eu>:
> 
> Hi Frank
> 
> Mmh, I don't think it's because of uncaught exceptions (at least not outside of SNMP4J). SNMP packets are continued to be sent out normally, the responses are received by kernel but not "recognized" by SNMP4J (see tcpdump output). I get normal timeouts as if the devices wouldn't respond any more (but they do). I reproduced the problem once again with TRACE logging enabled. The SNMP4J logs appear to be very normal, except that there are no more "Received message from ... with length ...: ..." messages any more (DefaultUdpTransportMapping.java:409). Here are the different types of log messages after problem occurs (in random order):
> 
> Sending message to .../161 with length 43: ... | (DefaultUdpTransportMapping.java:112)
> Removed pending request with handle: PduHandle[...] | (Snmp.java:995)
> Request timed out: ... | (Snmp.java:1900)
> Running pending sync request with handle PduHandle[...] and retry count left 1 | (Snmp.java:1808)
> 
> 
> The stacktraces look the same when everything is running fine:
> 
> "snmp.3" #39 prio=5 os_prio=0 tid=0x00007f682a81a000 nid=0x2a71 in Object.wait() [0x00007f67932f9000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x0000000088861810> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> "snmp.2" #38 prio=5 os_prio=0 tid=0x00007f682a818000 nid=0x2a70 in Object.wait() [0x00007f67933fa000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x0000000088861a20> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> "snmp.1" #37 prio=5 os_prio=0 tid=0x00007f6829986800 nid=0x2a6f in Object.wait() [0x00007f67934fb000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x0000000088861c30> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> "snmp.0" #36 prio=5 os_prio=0 tid=0x00007f6829988800 nid=0x2a6e in Object.wait() [0x00007f67935fc000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x0000000088861e40> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> 
> Here are the stacktraces after problem starts:
> 
> "snmp.3" #39 prio=5 os_prio=0 tid=0x00007f682a81a000 nid=0x2a71 in Object.wait() [0x00007f67932f9000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x00000000888185a0> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> "snmp.2" #38 prio=5 os_prio=0 tid=0x00007f682a818000 nid=0x2a70 in Object.wait() [0x00007f67933fa000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x00000000888187b0> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> "snmp.1" #37 prio=5 os_prio=0 tid=0x00007f6829986800 nid=0x2a6f in Object.wait() [0x00007f67934fb000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x00000000888189c0> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> "snmp.0" #36 prio=5 os_prio=0 tid=0x00007f6829988800 nid=0x2a6e in Object.wait() [0x00007f67935fc000]
>   java.lang.Thread.State: WAITING (on object monitor)
>    at java.lang.Object.wait(Native Method)
>    at java.lang.Object.wait(Object.java:502)
>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>    - locked <0x0000000088818bd0> (a org.snmp4j.util.ThreadPool$TaskManager)
> 
> 
> 
> Best regards
> Steffen Brüntjen
> 
> 
> -----Original Message-----
> From: Frank Fock [mailto:fock at agentpp.com] 
> Sent: Mittwoch, 4. April 2018 00:54
> To: Steffen Brüntjen <Steffen.Bruentjen at macmon.eu>
> Cc: snmp4j at agentpp.org
> Subject: Re: [SNMP4J] Timeouts in version 2.5.7+
> 
> Hi Steffen,
> 
> From the stack trace it seems that there is no idle TaskManager in your ThreadPool for the MultiThreadMessageDispatcher left.
> Maybe all TaskManagers died because of exceptions?
> 
> You can simply use the DefaultMessageDispatcher in your case. I do not expect advantages from the multi-threaded variant. Maybe you can find the root cause easier with the simple message dispatcher.
> 
> Best regards,
> Frank 
> 
> 
>> On 3. Apr 2018, at 13:26, Steffen Brüntjen <Steffen.Bruentjen at macmon.eu> wrote:
>> 
>> Hi Frank
>> 
>> 
>> Thanks for your reply. I don't reuse PDUs. In case of SNMPv2c, the effective statements would be:
>> 
>> PDU pdu = new PDU();
>> pdu.setType(PDU.GET);
>> pdu.add(new VariableBinding(...));
>> ResponseEvent responseEvent = snmp.send(pdu, target); // timeout
>> 
>> I believe the PDU request IDs will be coming from this code in MessageDispatcherImpl:
>> 
>> protected PduHandle createPduHandle() {
>>   return new PduHandle(getNextRequestID());
>> }
>> 
>> 
>> There's only one static snmp instance for all targets, I read somewhere that this is the preferred implementation.
>> 
>> 
>> I'am also able to remote debug the program, but I wouldn't know where to look at. Also, it's a bit of extra work and it will take a while, but it's possible.
>> 
>> 
>> Best regards,
>> Steffen
>> 
>> 
>> -----Original Message-----
>> From: Frank Fock [mailto:fock at agentpp.com] 
>> Sent: Sonntag, 1. April 2018 08:23
>> To: Steffen Brüntjen <Steffen.Bruentjen at macmon.eu>
>> Cc: snmp4j at agentpp.org
>> Subject: Re: [SNMP4J] Timeouts in version 2.5.7+
>> 
>> Hi Steffen,
>> 
>> There was no change on the DefaultUdpTransportMapping between 2.5.6 and 2.5.7, thus I assume that behaviour change is a side effect of some other changes. SNMP4J is no a bit faster.
>> Have you checked your code for race conditions?
>> When you use Snmp.send, how do you create the request IDs of the PDUs? 
>> 
>> Also reusing the PDU objects before the request is finished might cause this issue. 
>> 
>> Best regards,
>> Frank  
>> 
>>> On 16. Mar 2018, at 12:14, Steffen Brüntjen <Steffen.Bruentjen at macmon.eu> wrote:
>>> 
>>> Hi!
>>> 
>>> I'm using SNMP4J for monitoring around 200 devices. It has been running stably for months. Now I found a problem in the latest releases: When my process runs for a couple of hours (sometimes days), after some time SNMP4J returns nothing but timeouts. My program initially uses Snmp.send() and TableUtils (I think, this doesn't matter). After the problem shows up once, all my program effectively does is querying the SysObjectIds (because this already results in timeouts):
>>> 
>>>  org.snmp4j.Snmp.send(get(SnmpV2Mib.sysObjectID), target)
>>> 
>>> 
>>> Some of the targets are community targets, some are user targets, but I get timeouts for all devices. I repeated testing with only community targets resulting in the same behavior. I tracked down the problem to release 2.5.7. The previous version 2.5.6 works without problems, all releases starting with 2.5.7 show the problem. I also analyzed the SNMP packages using tcpdump. The devices send their answers quickly, the packets don't look any special. It seems that SNMP4J doesn't receive, recognize or process the UDP packets.
>>> 
>>> Can you reproduce the problem? Does anyone see problems in source code changes from 2.5.6 to 2.5.7?
>>> 
>>> Best regards
>>> Steffen Brüntjen
>>> 
>>> 
>>> 
>>> Here are some stacktraces at the time the problem exists:
>>> 
>>> "DefaultUDPTransportMapping_0.0.0.0/162" #72 daemon prio=5 os_prio=0 tid=0x00007fa0ac00a000 nid=0x4433 runnable [0x00007fa08b4f3000]
>>> java.lang.Thread.State: RUNNABLE
>>>    at java.net.PlainDatagramSocketImpl.receive0(Native Method)
>>>    - locked <0x0000000089977c28> (a java.net.PlainDatagramSocketImpl)
>>>    at java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:143)
>>>    - locked <0x0000000089977c28> (a java.net.PlainDatagramSocketImpl)
>>>    at java.net.DatagramSocket.receive(DatagramSocket.java:812)
>>>    - locked <0x0000000089b2b3a8> (a java.net.DatagramPacket)
>>>    - locked <0x0000000089977c78> (a java.net.DatagramSocket)
>>>    at org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread.run(DefaultUdpTransportMapping.java:397)
>>>    at java.lang.Thread.run(Thread.java:745)
>>> 
>>> "DefaultUDPTransportMapping_0.0.0.0/0" #33 daemon prio=5 os_prio=0 tid=0x00007fa126c9d800 nid=0x440c in Object.wait() [0x00007fa0b60f1000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at java.lang.Object.wait(Object.java:502)
>>>    at org.snmp4j.util.ThreadPool.execute(ThreadPool.java:103)
>>>    - locked <0x00000000883cde50> (a org.snmp4j.util.ThreadPool)
>>>    at org.snmp4j.util.MultiThreadedMessageDispatcher.processMessage(MultiThreadedMessageDispatcher.java:162)
>>>    at org.snmp4j.transport.AbstractTransportMapping.fireProcessMessage(AbstractTransportMapping.java:76)
>>>    at org.snmp4j.transport.DefaultUdpTransportMapping$ListenThread.run(DefaultUdpTransportMapping.java:430)
>>>    at java.lang.Thread.run(Thread.java:745)
>>> 
>>> "snmp.3" #37 prio=5 os_prio=0 tid=0x00007fa125a38000 nid=0x4410 in Object.wait() [0x00007fa0b5ced000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at java.lang.Object.wait(Object.java:502)
>>>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>>>    - locked <0x00000000883ddeb8> (a org.snmp4j.util.ThreadPool$TaskManager)
>>> 
>>> "snmp.2" #36 prio=5 os_prio=0 tid=0x00007fa126ca1800 nid=0x440f in Object.wait() [0x00007fa0b5dee000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at java.lang.Object.wait(Object.java:502)
>>>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>>>    - locked <0x00000000883de0c8> (a org.snmp4j.util.ThreadPool$TaskManager)
>>> 
>>> "snmp.1" #35 prio=5 os_prio=0 tid=0x00007fa126ca0800 nid=0x440e in Object.wait() [0x00007fa0b5eef000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at java.lang.Object.wait(Object.java:502)
>>>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>>>    - locked <0x00000000883de2d8> (a org.snmp4j.util.ThreadPool$TaskManager)
>>> 
>>> "snmp.0" #34 prio=5 os_prio=0 tid=0x00007fa126ca0000 nid=0x440d in Object.wait() [0x00007fa0b5ff0000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at java.lang.Object.wait(Object.java:502)
>>>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>>>    - locked <0x00000000883de4e8> (a org.snmp4j.util.ThreadPool$TaskManager)
>>> 
>>> "trap.0" #71 prio=5 os_prio=0 tid=0x00007fa0ac004000 nid=0x4432 in Object.wait() [0x00007fa08b5f4000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at java.lang.Object.wait(Object.java:502)
>>>    at org.snmp4j.util.ThreadPool$TaskManager.run(ThreadPool.java:275)
>>>    - locked <0x0000000089979080> (a org.snmp4j.util.ThreadPool$TaskManager)
>>> 
>>> 
>>> And then there are some monitoring threads that send SNMP queries:
>>> 
>>> "mon-001" #43 prio=5 os_prio=0 tid=0x00007fa0c000c000 nid=0x4416 in Object.wait() [0x00007fa0b52e7000]
>>> java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>>    at java.lang.Object.wait(Native Method)
>>>    at org.snmp4j.Snmp.send(Snmp.java:991)
>>>    - locked <0x00000000fdc0c6e0> (a org.snmp4j.Snmp$SyncResponseListener)
>>>    at org.snmp4j.Snmp.send(Snmp.java:963)
>>>    at org.snmp4j.Snmp.send(Snmp.java:928)
>>>   at [...]
>>> 
>>> 
>>> 
>>> And here's some non-binary tcpdump output:
>>> 
>>> 12:05:49.820810 IP 192.168.101.211.57522 > 10.100.7.204.snmp:  GetRequest(28)  system.sysObjectID.0
>>> E..G.. at .@. at ...e.
>>> d.......38.0).....public..../.........0.0...+.........
>>> 12:05:49.821321 IP 10.100.7.204.snmp > 192.168.101.211.57522:  GetResponse(37)  system.sysObjectID.0=E:cisco.1.614
>>> E..P.. at .@.SW
>>> d....e......<..02.....public.%../.........0.0...+........       +....   ..f
>>> ^C
>>> _______________________________________________
>>> SNMP4J mailing list
>>> SNMP4J at agentpp.org
>>> https://oosnmp.net/mailman/listinfo/snmp4j
>> 
> 



More information about the SNMP4J mailing list