[AGENT++] Crash in function MibLeaf::get_value where object value calls a pure virtual function

Frank Fock fock at agentpp.com
Tue Feb 12 00:19:10 CET 2013


Hello Jean-Philippe,

If the crash still occurs, then there must be another race condition
of simple memory allocation error in your code.
Although there is always the possibility that there is a bug
in AGENT++, such a race condition or memory allocation
fault was not reported for the last 10 years in the kernel
of AGENT++.

Best regards,
Frank

Am 11.02.2013 09:34, schrieb Jacquemin, Jean-Philippe:
> Dear Frank,
>
> Thanks a lot for your quick answer.
> I had a look at the code again and compared with the examples. I found
> out that we do lock and unlock the mib every single time we do acces it.
> However, we never use the "synch" of the MibEntry.
>
> Nevertheless, after having modified the code accordingly, we still have
> the crash after approximately one day at the same place (in the
> get_value()).
>
> The version of the agent with the unlock moved after the
> MibLeaf->unsynch() is still running and functional.
> (We are running a thread pool of 5 threads to process the SNMP
> resquests).
>
> So it seems that we get this crash due to a mibleaf being destroyed
> while we are trying to access it.
>
> Does the synchronization of the leaf prevent it from being destroyed?
> We are not using dynamic tables and we do instantiate the complete mib
> at startup, so we should never be deleting any leaves...
>
> Best regards,
> Jean-Philippe
>
>
> -----Original Message-----
> From: agentpp-bounces at agentpp.org [mailto:agentpp-bounces at agentpp.org]
> On Behalf Of Frank Fock
> Sent: Thursday, February 07, 2013 11:28 PM
> To: agentpp at agentpp.org
> Subject: Re: [AGENT++] Crash in function MibLeaf::get_value where object
> value calls a pure virtual function
>
> Hi,
>
> The crash is most likely caused by a missing synchronization in the
> instrumentation code
> (your code).
>
> When accessing Mib data always lock in this order
>
> myMib->lock_mib();
>
> <search/lookup mibEntry>
>
> mibEntry->start_synch();
> myMib->unlock_mib();
>
> <do something with mibEntry>
>
> mibEntry->end_synch();
>
> The unlock is done early (right after mibEntry is locked)
> to allow concurrent operations in the agent (safely).
>
> Best regards,
> Frank
>
> Am 07.02.2013 11:30, schrieb Jacquemin, Jean-Philippe:
>> Dear All,
>>
>>    
>>
>> I am having a crash when porting my snmp agent to another CPU.
>>
>> This is due to an access to a method that is at that time a pure
>> virtual.
>>
>> Here is the back trace:
>>
>>    
>>
>> Program received signal SIGABRT, Aborted.
>>
>> [Switching to LWP 1706]
>>
>> 0x4825319c in raise () from /lib/libc.so.0
>>
>> (gdb) where
>>
>> #0  0x4825319c in raise () from /lib/libc.so.0
>>
>> #1  0x4824da80 in abort () from /lib/libc.so.0
>>
>> #2  0x4815f740 in __gnu_cxx::__verbose_terminate_handler() ()
>>
>>      from /usr/lib/libstdc++.so.6
>>
>> #3  0x4815b510 in ?? () from /usr/lib/libstdc++.so.6
>>
>> #4  0x4815b564 in std::terminate() () from /usr/lib/libstdc++.so.6
>>
>> #5  0x4815d0b8 in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
>>
>> #6  0x10033a04 in Vb::set_value (this=0x48293be8, val=...)
>>
>>       at ../../../snmp++/include/snmp_pp/vb.h:161
>>
>> #7  0x100460e8 in Agentpp::MibLeaf::get_value (this=0x10183110) at
>> mib.cpp:342
>>
>> #8  0x1004665c in Agentpp::MibLeaf::get_request (this=0x10183110, req=
>>
>>       0x101dee08, ind=5) at mib.cpp:396
>>
>> #9  0x1004f2d8 in Agentpp::MibTable::get_next_request
> (this=0x1017edb0,
>>       req=0x101dee08, ind=5) at mib.cpp:2075
>>
>> #10 0x10056070 in Agentpp::Mib::process_request (this=0x10150d10,
>>
>>       req=0x101dee08, reqind=5) at mib.cpp:3355
>>
>> #11 0x10056a8c in Agentpp::Mib::do_process_request (this=0x10150d10,
>>
>>       req=0x101dee08) at mib.cpp:3542
>>
>> #12 0x1005e50c in Agentpp::MibTask::run (this=0x101722a8) at
>> threads.cpp:957
>>
>> #13 0x1005d660 in Agentpp::TaskManager::run (this=0x101522e0)
>>
>>       at threads.cpp:779
>>
>> #14 0x1005cd80 in Agentpp::thread_starter (t=0x10152330) at
>> threads.cpp:488
>>
>> #15 0x480223dc in ?? () from /lib/libpthread.so.0
>>
>> ---Type <return> to continue, or q <return> to quit---
>>
>> #16 0x482529ec in clone () from /lib/libc.so.0
>>
>>    
>>
>>    
>>
>> Since the code is running fine on another CPU (same CPU family but
>> difference ressources and obviously speed) I am thinking of a race
>> condition, that would show now an issue that has always been present.
>>
>> It seems here that the variable binding objet which holds the mibleaf
>> value is being destroyed while accessed here. Which would result in
> that
>> call to the pure virtural function.
>>
>>    
>>
>> When looking at the code of function get_request, we noticed that the
>> mib is unlocked in the middle of the synchronization of "entry".
>>
>>    
>>
>>                           entry->start_synch();
>>
>>                           unlock_mib();
>>
>>                           entry->get_request(req, reqind);
>>
>>                           entry->end_synch();
>>
>>    
>>
>> This is probably made to avoid a deadlock. So we started to look in
> the
>> code what could cause the deadlock and could not figure it out.
>>
>> So we moved the unlock at the end of this sequence, and since then it
>> works fine.
>>
>>    
>>
>> However it seems that the unlock was put there specially for a reason.
>>
>> Can anyone elaborate on that ?
>>
>> Besides, if there is a possible deadlock, what would be the impact of
>> changing the mutex which is behind the mib lock function to recursive?
>>
>>    
>>
>> Another additional information, the crash is happening with the agent
>> running on the new CPU but running a 3.x kernel now, while on the old
>> CPU is was using a 2.6.x kernel.
>>
>>    
>>
>> Thanks a lot,
>>
>>    
>>
>> Regards,
>>
>> Jean-Philippe
>>
>>    
>>
>>    
>>
>>
>>
>> DISCLAIMER:
>> Unless indicated otherwise, the information contained in this message
> is privileged and confidential, and is intended only for the use of the
> addressee(s) named above and others who have been specifically
> authorized to receive it. If you are not the intended recipient, you are
> hereby notified that any dissemination, distribution or copying of this
> message and/or attachments is strictly prohibited. The company accepts
> no liability for any damage caused by any virus transmitted by this
> email. Furthermore, the company does not warrant a proper and complete
> transmission of this information, nor does it accept liability for any
> delays. If you have received this message in error, please contact the
> sender and delete the message. Thank you.
>> _______________________________________________
>> AGENTPP mailing list
>> AGENTPP at agentpp.org
>> http://lists.agentpp.org/mailman/listinfo/agentpp

-- 
---
AGENT++
Maximilian-Kolbe-Str. 10
73257 Koengen, Germany
https://agentpp.com
Phone: +49 7024 8688230
Fax:   +49 7024 8688231



More information about the AGENTPP mailing list