Dthreads: Efficient Deterministic Multithreading

Dthreads: Efficient Deterministic Multithreading,
Tongping Liu, Charlie Curtsinger, Emery D. Berger
SOSP 2011

[paper (PDF)][source code] [YouTube video of presentation][PPT slides]

Multithreaded programming is notoriously difficult to get right. A key problem is non-determinism, which complicates debugging, testing, and reproducing errors. One way to simplify multithreaded programming is to enforce deterministic execution, but current deterministic systems for C/C++ are incomplete or impractical. These systems require program modification, do not ensure determinism in the presence of data races, do not work with general-purpose multithreaded programs, or run up to 8.4× slower than pthreads.

This paper presents Dthreads, an efficient deterministic mul- tithreading system for unmodified C/C++ applications that replaces the pthreads library. Dthreads enforces determinism in the face of data races and deadlocks. Dthreads works by exploding multithreaded applications into multiple processes, with private, copy-on-write mappings to shared memory. It uses standard virtual memory protection to track writes, and deterministically orders updates by each thread. By separating updates from different threads, Dthreads has the additional benefit of eliminating false sharing. Experimental results show that Dthreads substantially outperforms a state-of-the-art deterministic runtime system, and for a majority of the benchmarks evaluated here, matches and occasionally exceeds the performance of pthreads.

 Related post: SHERIFF: Precise Detection and Automatic Mitigation of False Sharing


11 thoughts on “Dthreads: Efficient Deterministic Multithreading”

  1. Hi,

    I downloaded and tried out the package. My impression is
    that it’s an interesting idea but the library needs a little
    more work to be a good drop-in replacement. Also, it needs
    examples and tests to be a production system and gain users
    although I realize that this is was not your goal.

    1. only supported for linux/x86 but only 18 lines of inline assembly.

    $ grep “asm.*volatile” *
    determ.h: __asm__ __volatile__ (“mfence”);

    prof.cpp: asm volatile (“rdtsc”
    xatomic.h: asm volatile (“lock; xchgl %0, %1”

    xheap.h: //__asm__ __volatile__ (“mfence”: : :”memory”);

    $ grep -c “asm.*volatile” * | grep -v “:0”

    2. I tried a two simple test programs last night on my dual core laptop
    running Ubuntu-11.10.

    Compilation/linking was fine.
    1. worked and was deterministic:
    string output NOinterTmingled. 🙂
    2. appeared to deadlock.

    3. I decided to try something that was a better test of performance:
    http://code.google.com/p/stressapptest/ (this could be useful too!)

    This app linked but failed at run-time because their
    heap replacement, heaplayer, doesn’t implement memalign().
    I tried a quick hack (just call malloc()) but that didn’t work.
    I looked a doing a proper implementation but decided that I had
    spent enough time playing.

    4. I was disappointed that dthreads didn’t come with some test/example

      1. It’s deadlocking on malloc in main().
        I had a valid dot product threaded program that compile and ran fine with pthread but not dthreads. I stripped it down to the program show below. Am I doing something wrong? I could enable some debugging in heaplayers if you like.

        $ g++ -Wall -fpermissive -o ../dthread-malloc.dth ../dthread-malloc.c -L. -ldthread -ldl
        ../dthread-malloc.c: In function ‘int main(int, char**)’:
        ../dthread-malloc.c:26:46: warning: cast from ‘double*’ to ‘unsigned int’ loses precision [-fpermissive]
        ../dthread-malloc.c:26:63: warning: cast from ‘double*’ to ‘unsigned int’ loses precision [-fpermissive]

        $ LD_LIBRARY_PATH=. ltrace -f ../dthread-fail.dth
        [pid 10858] __libc_start_main(0x400758, 1, 0x7fffd39fa678, 0x400860, 0x4008f0
        [pid 10858] malloc(3200^C
        [pid 10858] — SIGINT (Interrupt) —
        [pid 10858] +++ killed by SIGINT +++



        #define NUMTHRDS 4
        #define VECLEN 100

        int main(int argc, char *argv[])
        double *a, *b;

        /* Assign storage and initialize values */
        a = (double*) malloc (NUMTHRDS*VECLEN*sizeof(double));
        b = (double*) malloc (NUMTHRDS*VECLEN*sizeof(double));

        if (a == NULL || b == NULL) {
        perror(“malloc in main.”);

        /* After joining, print out the results and cleanup */
        printf (“a = %x, b = %x \n”, (unsigned int)a, (unsigned int)b);

        free (a);
        free (b);


        $ LD_LIBRARY_PATH=. gdb ../dthread-fail.dth
        GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
        Copyright (C) 2011 Free Software Foundation, Inc.
        License GPLv3+: GNU GPL version 3 or later
        This is free software: you are free to change and redistribute it.
        There is NO WARRANTY, to the extent permitted by law. Type “show copying”
        and “show warranty” for details.
        This GDB was configured as “x86_64-linux-gnu”.
        For bug reporting instructions, please see:

        Reading symbols from /home/rmacleod/src/tools/debug/dthread-fail.dth…(no debugging symbols found)…done.
        (gdb) run
        Starting program: /home/rmacleod/src/tools/debug/dthread-fail.dth
        [Thread debugging using libthread_db enabled]
        Program received signal SIGINT, Interrupt.
        __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
        136 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
        in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S
        (gdb) bt
        #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
        #1 0x00007ffff740a1e5 in _L_lock_883 () from /lib/x86_64-linux-gnu/libpthread.so.0
        #2 0x00007ffff740a03a in __pthread_mutex_lock (mutex=0x7fff7a3cf000) at pthread_mutex_lock.c:61
        #3 0x00007ffff7bcadb6 in malloc () from ./libdthread.so
        #4 0x0000000000400780 in main ()
        (gdb) bt full
        #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
        No locals.
        #1 0x00007ffff740a1e5 in _L_lock_883 () from /lib/x86_64-linux-gnu/libpthread.so.0
        No symbol table info available.
        #2 0x00007ffff740a03a in __pthread_mutex_lock (mutex=0x7fff7a3cf000) at pthread_mutex_lock.c:61
        __PRETTY_FUNCTION__ = “__pthread_mutex_lock”
        type = 2050813952
        #3 0x00007ffff7bcadb6 in malloc () from ./libdthread.so
        No symbol table info available.
        #4 0x0000000000400780 in main ()
        No symbol table info available.

  2. Out of curiosity: how does DTHREADS determine when one thread’s updates to a page have ended, so as to block other threads’ access during it?

    1. Thread updates proceed in epochs: currently, Dthreads commits each thread’s updates deterministically at synchronization points (thread create / join / end, lock acquire / release, condition variable wait / signal / broadcast).

  3. This is a great idea. Could you expand what it means to “explode threads into processes” some more. The shared memory structures must mean that synchronization primitive handles are handles to these shared memory areas then? Is this implementation for C++ roughly homomorphic to any other non-C++-language’s deterministic threading capabilities? Does this approach owe anything to some other language or RTL library for some other platform?

    Warren Postma

    1. Hi Warren,

      In short, processes share private copy-on-write mappings to memory spaces (heap & globals). Dthreads hijacks all synchronization operations. Updates are committed in rounds. The PowerPoint presentation (best accompanied by the video) provides a nice overview.

      Dthreads’ implementation is sui generis; languages with language-level determinism don’t have to jump through as many hoops as we do with C/C++.

      Dthreads certainly builds on related work, which we discuss in the paper.

      — Emery

  4. Hi Emery,
    Thank you for providing the source code. 🙂 Could you please post some of the examples/test programs that you used?


Comments are closed.