[SNMP4J] SNMP4J, UDP, network buffers and EPERM

Frank Fock fock at agentpp.com
Wed Oct 15 09:43:58 CEST 2014


Hi David,

SNMP4J does not implement "transmit pacing" because in (pure) Java you 
cannot get any
information of the UDP buffer usage. Slowing down the sending of 
messages in general
will not be acceptable for 99% of the users and use cases.

To be able to control burst sending of messages, many big network 
management applications
use their own centrally implemented throttle mechansims which apply 
mostly for trap
directed polling (for example in thunderstorm situation when many 
network elements
send alarm traps at once).

SNMP4J-Model, an high-level API for client SNMP developement which is 
currently
under developement, will provide means to centrally control load on 
transport ports
and can be used to implement such "transmit pacing".

Best regards,
Frank

Am 14.10.2014 12:50, schrieb David Corbin:
> TL; DR: Does SNMP4J provide "transmit pacing" for UDP?  Does it handle
> Linux's "EPERM error when" buffers overflow? (By handle I mean beyond
> throwing an exception and failing).
>
> I'm going to start with some background,  We have a large complex (overly
> so) system that monitors some "stuff" using SNMP4J.  It generally works.  I
> have an integration test suite that drives our system. We use a dedicated
> machine with dozens if IP addresses that is programmed to respond to SNMP
> requests in a way that the integration test suite expects.  For the
> purposes of this converstaion, There's one specific test class   with 11
> tests, and when I run it against our production code all of it's tests
> pass very consistently.   Our complex system depends on Mule and JMS.  Most
> if not all SNMP requests being made are being sent through JMS to an
> SnmpExecutor service.  That Mule service in turn calls a Java class
> (SnmpQueryExecutor) to synchronously resolve the request for SNMP data (the
> SNMP request is asynchronous, but the Java code has it's own wait for the
> answer or timeout) before proceeding.  The JMS client blocks waiting for
> the SNMP request to complete (or timeout) before continuing on.
>
> In attempt to simplify our complex system, I made a refactoring (on some
> execution paths) to call the Java class directly, bypassing the JMS and
> Mule part.  The integration tests now fail intermittently.  There are about
> 4 tests that sometimes fail.  The nature of the failures is also not
> consistent.  Initially, one of them would fail on most runs (4 out 5).
> After reducing the code-paths that use the new code to exactly one, and
> this has dropped to about 1 failure every 3 or 4 runs.  I've learned that I
> can make the test pass reliably by adding a 1 second delay to the new
> code-path.  This change was for investigative purposes only.   It suggests
> that there is a race condition or some type of failure that is related to
> doing too much SNMP too fast, and since Mule and JMS add a fair amount of
> overhead, it's probably been masked for some time.
>
> After a lot of work, I was able to discover that some of the failures are
> caused by this exception:
> java.io.IOException: Operation not permitted
>      at java.net.PlainDatagramSocketImpl.send(Native Method)
>      at java.net.DatagramSocket.send(DatagramSocket.java:676)
>      at
> org.snmp4j.transport.DefaultUdpTransportMapping.sendMessage(DefaultUdpTransportMapping.java:117)
>      at
> org.snmp4j.transport.DefaultUdpTransportMapping.sendMessage(DefaultUdpTransportMapping.java:42)
>      at
> org.snmp4j.MessageDispatcherImpl.sendMessage(MessageDispatcherImpl.java:198)
>      at
> org.snmp4j.MessageDispatcherImpl.sendPdu(MessageDispatcherImpl.java:498)
>      at
> org.snmp4j.util.MultiThreadedMessageDispatcher.sendPdu(MultiThreadedMessageDispatcher.java:127)
>      at org.snmp4j.Snmp.sendMessage(Snmp.java:1004)
>      at org.snmp4j.Snmp.send(Snmp.java:974)
>      at org.snmp4j.Snmp.send(Snmp.java:958)
>      ....
>
> While digging for information about this, I found this thread
> https://github.com/typesafehub/play-plugins/issues/64 which suggests at the
> end that this error happens on Linux systems when the network buffers get
> full. (Yes, I'm developing on a Linux system).  Digging for more
> information about that, I found this thread
> http://compgroups.net/comp.protocols.tcp-ip/udp-socket-sendto-eperm/2624182,
> where they talk about UDP and "transmit pacing".  Essentially they say it's
> the programmers responsibility to not send UDP packets too fast.  Seems
> reasonable.
>
> So, my assumption is that the UDPTransport should be doing this.    I did
> look through some of the code, and while I did not see anything that would
> do this, that doesn't mean it's not there.  Is it?  Is the IOException
> (EPERM) causing "transmit pacing" or even normal retries of UDP to not work?
>
> More information:
> One test that fails (and the most SNMP active, I think), makes 5 SNMP
> requests of 3 different IP addresses.  The addresses are all simple GET
> operations, totalling 8 OIDs in all.  Some of these requests are in
> parallel by different threads.
>
> I'm very open to any other suggestions people want to make as to why this
> change would cause this behavior.  All help appreciated.
>
> David Corbin
> _______________________________________________
> SNMP4J mailing list
> SNMP4J at agentpp.org
> https://oosnmp.net/mailman/listinfo/snmp4j

-- 
---
AGENT++
Maximilian-Kolbe-Str. 10
73257 Koengen, Germany
https://agentpp.com
Phone: +49 7024 8688230
Fax:   +49 7024 8688231




More information about the SNMP4J mailing list