[AGENT++] Crash in function MibLeaf::get_value where object value calls a pure virtual function

Jacquemin, Jean-Philippe jean-philippe.jacquemin at barco.com
Fri Feb 22 10:30:28 CET 2013


Hi Frank,

We still haven't found the bug. The hack is working well for the moment
but we still would like to find the root cause.
Maybe can you explain what the "synch" is doing ?
Thanks for your help.

Jean-Philippe

-----Original Message-----
From: Frank Fock [mailto:fock at agentpp.com] 
Sent: Tuesday, February 12, 2013 12:19 AM
To: Jacquemin, Jean-Philippe
Cc: agentpp at agentpp.org
Subject: Re: [AGENT++] Crash in function MibLeaf::get_value where object
value calls a pure virtual function

Hello Jean-Philippe,

If the crash still occurs, then there must be another race condition of
simple memory allocation error in your code.
Although there is always the possibility that there is a bug in AGENT++,
such a race condition or memory allocation fault was not reported for
the last 10 years in the kernel of AGENT++.

Best regards,
Frank

Am 11.02.2013 09:34, schrieb Jacquemin, Jean-Philippe:
> Dear Frank,
>
> Thanks a lot for your quick answer.
> I had a look at the code again and compared with the examples. I found

> out that we do lock and unlock the mib every single time we do acces
it.
> However, we never use the "synch" of the MibEntry.
>
> Nevertheless, after having modified the code accordingly, we still 
> have the crash after approximately one day at the same place (in the 
> get_value()).
>
> The version of the agent with the unlock moved after the
> MibLeaf->unsynch() is still running and functional.
> (We are running a thread pool of 5 threads to process the SNMP 
> resquests).
>
> So it seems that we get this crash due to a mibleaf being destroyed 
> while we are trying to access it.
>
> Does the synchronization of the leaf prevent it from being destroyed?
> We are not using dynamic tables and we do instantiate the complete mib

> at startup, so we should never be deleting any leaves...
>
> Best regards,
> Jean-Philippe
>
>
> -----Original Message-----
> From: agentpp-bounces at agentpp.org [mailto:agentpp-bounces at agentpp.org]
> On Behalf Of Frank Fock
> Sent: Thursday, February 07, 2013 11:28 PM
> To: agentpp at agentpp.org
> Subject: Re: [AGENT++] Crash in function MibLeaf::get_value where 
> object value calls a pure virtual function
>
> Hi,
>
> The crash is most likely caused by a missing synchronization in the 
> instrumentation code (your code).
>
> When accessing Mib data always lock in this order
>
> myMib->lock_mib();
>
> <search/lookup mibEntry>
>
> mibEntry->start_synch();
> myMib->unlock_mib();
>
> <do something with mibEntry>
>
> mibEntry->end_synch();
>
> The unlock is done early (right after mibEntry is locked) to allow 
> concurrent operations in the agent (safely).
>
> Best regards,
> Frank
>
> Am 07.02.2013 11:30, schrieb Jacquemin, Jean-Philippe:
>> Dear All,
>>
>>    
>>
>> I am having a crash when porting my snmp agent to another CPU.
>>
>> This is due to an access to a method that is at that time a pure 
>> virtual.
>>
>> Here is the back trace:
>>
>>    
>>
>> Program received signal SIGABRT, Aborted.
>>
>> [Switching to LWP 1706]
>>
>> 0x4825319c in raise () from /lib/libc.so.0
>>
>> (gdb) where
>>
>> #0  0x4825319c in raise () from /lib/libc.so.0
>>
>> #1  0x4824da80 in abort () from /lib/libc.so.0
>>
>> #2  0x4815f740 in __gnu_cxx::__verbose_terminate_handler() ()
>>
>>      from /usr/lib/libstdc++.so.6
>>
>> #3  0x4815b510 in ?? () from /usr/lib/libstdc++.so.6
>>
>> #4  0x4815b564 in std::terminate() () from /usr/lib/libstdc++.so.6
>>
>> #5  0x4815d0b8 in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
>>
>> #6  0x10033a04 in Vb::set_value (this=0x48293be8, val=...)
>>
>>       at ../../../snmp++/include/snmp_pp/vb.h:161
>>
>> #7  0x100460e8 in Agentpp::MibLeaf::get_value (this=0x10183110) at
>> mib.cpp:342
>>
>> #8  0x1004665c in Agentpp::MibLeaf::get_request (this=0x10183110, 
>> req=
>>
>>       0x101dee08, ind=5) at mib.cpp:396
>>
>> #9  0x1004f2d8 in Agentpp::MibTable::get_next_request
> (this=0x1017edb0,
>>       req=0x101dee08, ind=5) at mib.cpp:2075
>>
>> #10 0x10056070 in Agentpp::Mib::process_request (this=0x10150d10,
>>
>>       req=0x101dee08, reqind=5) at mib.cpp:3355
>>
>> #11 0x10056a8c in Agentpp::Mib::do_process_request (this=0x10150d10,
>>
>>       req=0x101dee08) at mib.cpp:3542
>>
>> #12 0x1005e50c in Agentpp::MibTask::run (this=0x101722a8) at
>> threads.cpp:957
>>
>> #13 0x1005d660 in Agentpp::TaskManager::run (this=0x101522e0)
>>
>>       at threads.cpp:779
>>
>> #14 0x1005cd80 in Agentpp::thread_starter (t=0x10152330) at
>> threads.cpp:488
>>
>> #15 0x480223dc in ?? () from /lib/libpthread.so.0
>>
>> ---Type <return> to continue, or q <return> to quit---
>>
>> #16 0x482529ec in clone () from /lib/libc.so.0
>>
>>    
>>
>>    
>>
>> Since the code is running fine on another CPU (same CPU family but 
>> difference ressources and obviously speed) I am thinking of a race 
>> condition, that would show now an issue that has always been present.
>>
>> It seems here that the variable binding objet which holds the mibleaf

>> value is being destroyed while accessed here. Which would result in
> that
>> call to the pure virtural function.
>>
>>    
>>
>> When looking at the code of function get_request, we noticed that the

>> mib is unlocked in the middle of the synchronization of "entry".
>>
>>    
>>
>>                           entry->start_synch();
>>
>>                           unlock_mib();
>>
>>                           entry->get_request(req, reqind);
>>
>>                           entry->end_synch();
>>
>>    
>>
>> This is probably made to avoid a deadlock. So we started to look in
> the
>> code what could cause the deadlock and could not figure it out.
>>
>> So we moved the unlock at the end of this sequence, and since then it

>> works fine.
>>
>>    
>>
>> However it seems that the unlock was put there specially for a
reason.
>>
>> Can anyone elaborate on that ?
>>
>> Besides, if there is a possible deadlock, what would be the impact of

>> changing the mutex which is behind the mib lock function to
recursive?
>>
>>    
>>
>> Another additional information, the crash is happening with the agent

>> running on the new CPU but running a 3.x kernel now, while on the old

>> CPU is was using a 2.6.x kernel.
>>
>>    
>>
>> Thanks a lot,
>>
>>    
>>
>> Regards,
>>
>> Jean-Philippe
>>
>>    
>>
>>    
>>
>>
>>
>> DISCLAIMER:
>> Unless indicated otherwise, the information contained in this message
> is privileged and confidential, and is intended only for the use of 
> the
> addressee(s) named above and others who have been specifically 
> authorized to receive it. If you are not the intended recipient, you 
> are hereby notified that any dissemination, distribution or copying of

> this message and/or attachments is strictly prohibited. The company 
> accepts no liability for any damage caused by any virus transmitted by

> this email. Furthermore, the company does not warrant a proper and 
> complete transmission of this information, nor does it accept 
> liability for any delays. If you have received this message in error, 
> please contact the sender and delete the message. Thank you.
>> _______________________________________________
>> AGENTPP mailing list
>> AGENTPP at agentpp.org
>> http://lists.agentpp.org/mailman/listinfo/agentpp

--
---
AGENT++
Maximilian-Kolbe-Str. 10
73257 Koengen, Germany
https://agentpp.com
Phone: +49 7024 8688230
Fax:   +49 7024 8688231



DISCLAIMER:
Unless indicated otherwise, the information contained in this message is privileged and confidential, and is intended only for the use of the addressee(s) named above and others who have been specifically authorized to receive it. If you are not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message and/or attachments is strictly prohibited. The company accepts no liability for any damage caused by any virus transmitted by this email. Furthermore, the company does not warrant a proper and complete transmission of this information, nor does it accept liability for any delays. If you have received this message in error, please contact the sender and delete the message. Thank you.


More information about the AGENTPP mailing list