Possible data race in Valgrind

Question

Has anyone ever encountered that when starting the program under Valgrind, messages like "Possible data race" indicate the function of blocking / unblocking mutexes? The mane seems to say clearly that Valgrind works great with Posix threads, but he clearly doesn’t like the pthread_mutex_lock / unlock functions, or am I missing something?

Possible data race during write of size 4 at 0x64BC34 by thread #1 ==11521== Locks held: none ==11521== at 0x3A60A1: pthread_mutex_lock (in /lib/libthr.so.3) ==11521== by 0x5C48E: pthread_mutex_lock (in /usr/local/lib/valgrind /vgpreload_helgrind-x86-freebsd.so) ==11521== by 0x8052651: md_mutex_operation (md_pthread.c:92) ==11521== by 0x804D9ED: md_prt (md_funcs.c:49) ==11521== by 0x804DFD6: md_freepthr_create (md_funcs.c:268) ==11521== by 0x804B031: main (ml_shd.c:374) ==11521== ==11521== This conflicts with a previous write of size 4 by thread #2 ==11521== Locks held: none ==11521== at 0x3A686A: ??? (in /lib/libthr.so.3) ==11521== by 0x5C88E: pthread_mutex_unlock (in /usr/local/lib/valgrind/vgpreload_helgrind-x86-freebsd.so) ==11521== by 0x805278F: md_mutex_operation (md_pthread.c:118) ==11521== by 0x804DAA4: md_prt (md_funcs.c:68) ==11521== by 0x8054EEA: md_freepthr_func (md_pthread.c:1193) ==11521== by 0x5F245: ??? (in /usr/local/lib/valgrind/vgpreload_helgrind-x86-freebsd.so) ==11521== by 0x39F4B9: ??? (in /lib/libthr.so.3) ==11521== ==11521== Address 0x64BC34 is 52 bytes inside a block of size 60 alloc'd ==11521== at 0x5B6FA: calloc (in /usr/local/lib/valgrind/vgpreload_helgrind-x86-freebsd.so) ==11521== by 0x3A596C: ??? (in /lib/libthr.so.3) ==11521== by 0x5EF68: pthread_mutex_init (in /usr/local/lib/valgrind/vgpreload_helgrind-x86-freebsd.so) ==11521== by 0x804C4D2: md_conninit (md_cfg.c:189) ==11521== by 0x804A19E: main (ml_shd.c:62) ==11521== ==11521== (action on error) vgdb me ... ==11521== Continuing ...

Update:

 #define MD_PRT(dlevel,fmt,...) \ md_prt(dlevel,"%s (%s:%d): " fmt,\ pname,__func__,__LINE__,##__VA_ARGS__) void md_prt(uint32_t dlevel,const char *fmt, ...) { va_list vp; md_mutex_operation(&(p_conn->log_mutex),MD_LOCK); va_start(vp,fmt); vsyslog(LOG_DEBUG,fmt,vp); va_end(vp); md_mutex_operation(&(p_conn->log_mutex),MD_UNLOCK); return; } int md_mutex_operation(p_mutex, operation) pthread_mutex_t *p_mutex; int operation; { int res = 0; time_t sec; struct timespec tim; tim.tv_nsec = 0; #ifdef ML_DEBUG char str[20]; #endif switch(operation){ case MD_LOCK: res = pthread_mutex_lock(p_mutex); #ifdef ML_DEBUG if(res) MD_PRT(MD_PRT_THREAD,"Error %d of locking mutex (%s, pthread %u)", res,str,pthread_self()); #endif break; case MD_TRYLOCK: res = pthread_mutex_trylock(p_mutex); #ifdef ML_DEBUG if(res) MD_PRT(MD_PRT_THREAD,"Error %d of trying lock mutex (%s, pthread %u)", res,str,pthread_self()); #endif break; case MD_TIMLOCK: time(&sec); tim.tv_sec = sec + MD_LOCK_TIMEOUT; res = pthread_mutex_timedlock(p_mutex,&tim); #ifdef ML_DEBUG if(res) MD_PRT(MD_PRT_THREAD,"Error %d of time locking mutex (%s, pthread %u)", res,str,pthread_self()); #endif break; case MD_UNLOCK: res = pthread_mutex_unlock(p_mutex); #ifdef ML_DEBUG if(res) MD_PRT(MD_PRT_THREAD,"Error %d of unlocking mutex (%s, pthread %u)", res,str,pthread_self()); #endif break; default : break; } return(res); }

Rather not like == 11521 == 4 by this # of threads == 11521 == Locks held: none
yes, but at the top of the pthread_mutex_lock and unlock call stack ... the md_prt () function captures the mutex, writes a message to the log, releases the mutex.
Static and global variables, besides the mutex itself, are not used, so it’s completely unclear to me what kind of conflict it has found
Maybe I'm not at all right, but it seems to me that from here md_mutex_operation (md_pthread.c: 118) should have called lock, not unlock.
Either somewhere before, on the way here, lock was not called.
- I have never worked with Valgrind, but I suspect that this debugger tracing streams caught simultaneous (or not protected by any lock (he writes about this: Locks held: none)) write to the address in the block received (by the way, this is interesting) from pthread_mutex_init which was called from md_conninit (md_cfg.c: 189).
Yes, the fact is that it is strange that Valgrind refers to some variable for which memory is allocated in pthread_mutex_init.
md_mutex_operation is just my wrapper for lock / unlock calls, in which the result of the operation is checked.
@margosh, judging by the stack of threads, this remark does not apply to the error, but IMHO is strange to call MD_PRT () from md_mutex_operation ().
See if you have an error in the operation with the mutex, you want to print it and again indirectly perform the operation with ( possibly the same ) mutex.
I would make the mutex in md_prt local static (the log you are protecting is one for all?), Why I don’t understand it in * p_conn.

mikelsv mikelsv 1,986 1 golden mark 13 silver marks 38 bronze marks · Answer 1 · 2012-11-14T19:10:04

Practice shows where valgrind shows there and you should look for an error. valgrind works fine with mutexes. I would look at the code working with them, maybe there is a mistake somewhere there.

You can try to remove for a while all that you can, leaving only mutexes and see how it will work.

If you have the opportunity, try to get rid of Valgrind, pliz.

margosh margosh 2,093 15 silver marks 33 bronze marks · Accepted Answer · 2012-11-15T12:42:36

Hooray! Reinstalled Valgrind and found this message when installing:

Known problems: 1) DRD / Helgrind tool gives a false positive for the internals of the pthreads library. This is now under investigation. 2) exp-ptrcheck tool doesn't work on FreeBSD now
If you’re not

I think this is about my problems. Thank you all for your help, avp especially :) (I will take into account your comment about a possible replay to seize the lock)

Possible data race in Valgrind

2 answers 2

More articles: