[AGENT++] AGENT++ 4.1.2 the PTHREAD_MUTEX_ERRORCHECK does not work right?

Sat Oct 27 07:39:44 CEST 2018

Hi Frank

I tested with the newest Code you released!

I work with different system, but our the target is Integrity178b OS on an embedded system based on PowerPC architecture.
I develop and test under Apple OSX, Linux, MSYS2 (under Windows), but nativ Windows too if I can’t prevent it.

With a test suite, we can TCOV the Code and ensure the functionality!
=====================================================

With cppcheck, clang-analyse, … we are sure about the code quality.
With clang-format it is much essayer to force our code style.
With cmake, I have a portable build system generator.
With ctest, I run my test suite after build.
We use pthreads, but (with C++14, and newer, we are going to a simpler code base)

And we have Problems with the AgentX++ Subagent, using the queuing thread pool:
We need that SNMP traps to be send in the right order and in some cases, we have a huge burst of notifications.

Too, some other constrains:
The code must be portable (i386, PowerPC, …).
We run on a 32 bit system with sizeof(time_t) != sizeof(int)
The code must not have a resource leak.
The code has to be exception save.
There should no race conditions at code.
We need that the threads are not deleted in subagent mib->init() call.
We need that each worker thread has its own integrity connection to the network stack (the connection count is limited!)
We need, that the same MIB can be registered in different SNMP v3 context names.
We need a robust system that handles timeouts with AgentX master agent.
And we have to work with net-snmp master agent.

The GHS compiler is a really bad, old one, so I normally try to test the code on the development host.
Wo we need an OS abstraction with a stable behavior.

There are race conditions in the AGENT++ thread code and we see deadlocks and segmentation violations!
==================================================================================

Since years, I try to monitor your changes and back-port the improvements.

And most often, I try to fix it myself:
————————————————

@@ -485,7 +480,7 @@ bool Synchronized::lock(unsigned long timeout)
 
 #ifdef HAVE_CLOCK_GETTIME
     clock_gettime(CLOCK_REALTIME, &ts);
-    ts.tv_sec += (int)timeout / 1000;
+    ts.tv_sec += (time_t)timeout / 1000;
     int millis = ts.tv_nsec / 1000000 + (timeout % 1000);
     if (millis >= 1000) {
         ts.tv_sec += 1;
@@ -494,7 +489,7 @@ bool Synchronized::lock(unsigned long timeout)
 #else
     struct timeval tv;
     gettimeofday(&tv, 0);
-    ts.tv_sec  = tv.tv_sec + (int)timeout / 1000;
+    ts.tv_sec  = tv.tv_sec + (time_t)timeout / 1000;
     int millis = tv.tv_usec / 1000 + (timeout % 1000);
     if (millis >= 1000) {
         ts.tv_sec += 1;
@@ -630,7 +625,7 @@ Synchronized::TryLockResult Synchronized::trylock()
         LOG((long)this);
         LOG_END;
         return LOCKED;
-    } else if (err == EDEADLK) {
+    } else if ((isLocked) && (err == EBUSY)) {
         // This thread owns already the lock, but
         // we do not like recursive locking and print a warning!
         LOG_BEGIN(loggerModuleName, WARNING_LOG | 5);

———————————————————————————
# NOTE: time_t is not int!

> Am 27.10.2018 um 02:01 schrieb Frank Fock <fock at agentpp.com>:
> 
> Why didn’t you name the systems which showed different behaviour? Which OS, version, AGENT++ version, compiler, and what differences, of course?
> 

I have send you all relevant information in my first mail!

Best regards,
Claus