Dthreads: Efficient Deterministic Multithreading,
Tongping Liu, Charlie Curtsinger, Emery D. Berger
SOSP 2011
[paper (PDF)][source code] [YouTube video of presentation][PPT slides]
Multithreaded programming is notoriously difficult to get right. A key problem is non-determinism, which complicates debugging, testing, and reproducing errors. One way to simplify multithreaded programming is to enforce deterministic execution, but current deterministic systems for C/C++ are incomplete or impractical. These systems require program modification, do not ensure determinism in the presence of data races, do not work with general-purpose multithreaded programs, or run up to 8.4× slower than pthreads.
This paper presents Dthreads, an efficient deterministic mul- tithreading system for unmodified C/C++ applications that replaces the pthreads library. Dthreads enforces determinism in the face of data races and deadlocks. Dthreads works by exploding multithreaded applications into multiple processes, with private, copy-on-write mappings to shared memory. It uses standard virtual memory protection to track writes, and deterministically orders updates by each thread. By separating updates from different threads, Dthreads has the additional benefit of eliminating false sharing. Experimental results show that Dthreads substantially outperforms a state-of-the-art deterministic runtime system, and for a majority of the benchmarks evaluated here, matches and occasionally exceeds the performance of pthreads.
Related post: SHERIFF: Precise Detection and Automatic Mitigation of False Sharing
Interesting work. Is the code available?
Yes. The code for Dthreads, along with other projects from my group, can be downloaded from my software page.
Hi,
I downloaded and tried out the package. My impression is
that it’s an interesting idea but the library needs a little
more work to be a good drop-in replacement. Also, it needs
examples and tests to be a production system and gain users
although I realize that this is was not your goal.
1. only supported for linux/x86 but only 18 lines of inline assembly.
$ grep “asm.*volatile” *
determ.h: __asm__ __volatile__ (“mfence”);
…
prof.cpp: asm volatile (“rdtsc”
xatomic.h: asm volatile (“lock; xchgl %0, %1”
…
xheap.h: //__asm__ __volatile__ (“mfence”: : :”memory”);
$ grep -c “asm.*volatile” * | grep -v “:0”
determ.h:6
prof.cpp:1
xatomic.h:10
xheap.h:1
2. I tried a two simple test programs last night on my dual core laptop
running Ubuntu-11.10.
Compilation/linking was fine.
Run-time:
1. worked and was deterministic:
string output NOinterTmingled. 🙂
2. appeared to deadlock.
3. I decided to try something that was a better test of performance:
http://code.google.com/p/stressapptest/ (this could be useful too!)
This app linked but failed at run-time because their
heap replacement, heaplayer, doesn’t implement memalign().
I tried a quick hack (just call malloc()) but that didn’t work.
I looked a doing a proper implementation but decided that I had
spent enough time playing.
4. I was disappointed that dthreads didn’t come with some test/example
programs.
Hi,
Please send us your test program that appeared to deadlock. Thanks!
It’s deadlocking on malloc in main().
I had a valid dot product threaded program that compile and ran fine with pthread but not dthreads. I stripped it down to the program show below. Am I doing something wrong? I could enable some debugging in heaplayers if you like.
$ g++ -Wall -fpermissive -o ../dthread-malloc.dth ../dthread-malloc.c -L. -ldthread -ldl
../dthread-malloc.c: In function ‘int main(int, char**)’:
../dthread-malloc.c:26:46: warning: cast from ‘double*’ to ‘unsigned int’ loses precision [-fpermissive]
../dthread-malloc.c:26:63: warning: cast from ‘double*’ to ‘unsigned int’ loses precision [-fpermissive]
$ LD_LIBRARY_PATH=. ltrace -f ../dthread-fail.dth
[pid 10858] __libc_start_main(0x400758, 1, 0x7fffd39fa678, 0x400860, 0x4008f0
[pid 10858] malloc(3200^C
[pid 10858] — SIGINT (Interrupt) —
[pid 10858] +++ killed by SIGINT +++
#include
#include
#include
#include
#define NUMTHRDS 4
#define VECLEN 100
int main(int argc, char *argv[])
{
double *a, *b;
/* Assign storage and initialize values */
a = (double*) malloc (NUMTHRDS*VECLEN*sizeof(double));
b = (double*) malloc (NUMTHRDS*VECLEN*sizeof(double));
if (a == NULL || b == NULL) {
perror(“malloc in main.”);
exit(EXIT_FAILURE);
}
/* After joining, print out the results and cleanup */
printf (“a = %x, b = %x \n”, (unsigned int)a, (unsigned int)b);
free (a);
free (b);
exit(EXIT_SUCCESS);
}
$ LD_LIBRARY_PATH=. gdb ../dthread-fail.dth
GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “x86_64-linux-gnu”.
For bug reporting instructions, please see:
…
Reading symbols from /home/rmacleod/src/tools/debug/dthread-fail.dth…(no debugging symbols found)…done.
(gdb) run
Starting program: /home/rmacleod/src/tools/debug/dthread-fail.dth
[Thread debugging using libthread_db enabled]
^C
Program received signal SIGINT, Interrupt.
__lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
136 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
(gdb) bt
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1 0x00007ffff740a1e5 in _L_lock_883 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007ffff740a03a in __pthread_mutex_lock (mutex=0x7fff7a3cf000) at pthread_mutex_lock.c:61
#3 0x00007ffff7bcadb6 in malloc () from ./libdthread.so
#4 0x0000000000400780 in main ()
(gdb) bt full
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
No locals.
#1 0x00007ffff740a1e5 in _L_lock_883 () from /lib/x86_64-linux-gnu/libpthread.so.0
No symbol table info available.
#2 0x00007ffff740a03a in __pthread_mutex_lock (mutex=0x7fff7a3cf000) at pthread_mutex_lock.c:61
__PRETTY_FUNCTION__ = “__pthread_mutex_lock”
type = 2050813952
#3 0x00007ffff7bcadb6 in malloc () from ./libdthread.so
No symbol table info available.
#4 0x0000000000400780 in main ()
No symbol table info available.
(gdb)
Out of curiosity: how does DTHREADS determine when one thread’s updates to a page have ended, so as to block other threads’ access during it?
Thread updates proceed in epochs: currently, Dthreads commits each thread’s updates deterministically at synchronization points (thread create / join / end, lock acquire / release, condition variable wait / signal / broadcast).
This is a great idea. Could you expand what it means to “explode threads into processes” some more. The shared memory structures must mean that synchronization primitive handles are handles to these shared memory areas then? Is this implementation for C++ roughly homomorphic to any other non-C++-language’s deterministic threading capabilities? Does this approach owe anything to some other language or RTL library for some other platform?
Warren Postma
Hi Warren,
In short, processes share private copy-on-write mappings to memory spaces (heap & globals). Dthreads hijacks all synchronization operations. Updates are committed in rounds. The PowerPoint presentation (best accompanied by the video) provides a nice overview.
Dthreads’ implementation is sui generis; languages with language-level determinism don’t have to jump through as many hoops as we do with C/C++.
Dthreads certainly builds on related work, which we discuss in the paper.
— Emery
Hi Emery,
Thank you for providing the source code. 🙂 Could you please post some of the examples/test programs that you used?
Regards,
Abhishek
The programs we examined in the Dthreads paper are from the PARSEC and Phoenix benchmark suites. You can obtain these at the following sites:
http://parsec.cs.princeton.edu/
http://mapreduce.stanford.edu/
Best,
— Emery