New Hoard release

I  just released a new version of Hoard that incorporates lots of changes (one of which I described in an earlier post).

This release incorporates a number of fixes and improvements, including:

  • Better per-thread allocation, improving speed for Unix platforms that do not support __thread (a fast way to do thread-specific data)
  • Added interception of Solaris threads API (thr_*) to work with older Solaris programs (the subject of the earlier post)
  • Fixed a rare race condition accidentally introduced in 3.7.1
  • Increased checking to protect against heap corruption or other errors
  • And Hoard now uses GNU-supported hooks on platforms with glibc (especially Linux), allowing it to work better with legacy programs (even emacs!).

Solution: Crashes When Optimized

In the previous post, I described a “puzzler”. Here’s the brief version:

After testing the code (debug build), and verifying that it worked fine, I re-ran the code in an optimized build, just to make sure that I hadn’t inadvertently introduced any performance regressions.

Instant segfault. For every benchmark.

In sum – crashes when optimized:

  • Everything works fine in debug builds.
  • Complete disaster in optimized builds.

Below is a code snippet that demonstrates the problem. I had forgotten to return a value from a function (see gimme()). In unoptimized mode (as Daniel Jiménez correctly guessed), this code worked because intermediates and locals were placed on the stack. In optimized mode, they’re stashed in registers, so the pointer gets overwritten. Kudos to Tongping for tracking this bug down.

Why g++ does not generate warnings for this code without adding “-Wall” is beyond me.

Moral? Always compile with “-Wall”…

static int x = 100;
static int * xptr = &x;

void * dat (void) {
void * p = xptr;
return p;

void * gimme (void) {
void * p = dat();

int main()
cout << *((int *) gimme()) << endl;
return 0;

A Puzzler: Crashes When Optimized

I came across a pretty cool bug that threw me for a loop. In the spirit of Car Talk, I’m turning this into a puzzler.

We have developed some cool performance enhancements for Grace (a runtime system that enables safe multithreaded C/C++ programming). To make it easier to selectively turn them on and off, I decided to reorganize some of the code. Not a big deal – essentially splitting a few classes into some component classes and moving code around. Pretty routine stuff.

And then something odd happened.

After testing the code (debug build), and verifying that it worked fine, I re-ran the code in an optimized build, just to make sure that I hadn’t inadvertently introduced any performance regressions.

Instant segfault. For every benchmark.

In sum – crashes when optimized:

  • Everything works fine in debug builds.
  • Complete disaster in optimized builds.

Enter your guesses below.

I was wrong! And right!

Turns out I was wrong and right at the same time.

I thought that the problem Thomson Reuters had discovered (detailed in the last post) was that Hoard’s spinlock implementation for Sparc was somehow broken, possibly by a newer version of the Sparc architecture (e.g., by using a horrible relaxed memory model). See this info from Sun about such models, including TSO and (yikes) RMO.

Suffice it to say that relaxed memory ordering breaks things in addition to being absolutely awful to reason about. But luckily, apparently saner minds have prevailed and that memory ordering — while supported by the Sparc architecture — is never enabled. Phew.

Anyway, while chasing down the bug I discovered an “impossible” sequence of events (a race, but under the protection of a lock), and switching from spinlocks to POSIX locks (slower, but safe) solved the problem. Aha! Plainly, something wrong with the spinlock! But, it turns out, the spinlock code is perfectly fine. It’s practically identical to what Hans Boehm does in his atomics library.

Next time both Hans and I have done things the same way, I will assume that Hans probably got it right, so I’m OK. 🙂 The real source of the problem was elsewhere, but it points up the perils of supporting cross-platform software (and legacy systems).

First, a little background.

Hoard has an optimization specifically for single-threaded programs. Unless you spawn a thread, Hoard does not enable any locking code or atomic operations (locks become simple loads and stores). IIRC, this increases performance by several percentage points for some programs, so it’s nothing to sneeze at.

It’s simple enough: Hoard has a Boolean value called anyThreadCreated, which is initially false. The Hoard library intercepts the pthread_create call (on Windows, it does something else that has the same effect). Among other things, any call to create a thread immediately sets anyThreadCreated to true. The spinlock implementation then enables real locking.

As you can imagine, if somehow a program were to run multiple threads with locks disabled, that would be bad.

Enter Solaris threads.

It turns out that Solaris has not one but two thread APIs.  They were the predecessor to the now-familiar POSIX threads API. However, some code still uses this old, non-portable API. Notably, the code running at Thomson Reuters.

Yeah, I knew about Solaris threads, since I programmed with them “back in the day” (in the mid to late 90’s), but I overlooked them.

What was happening was that Hoard was not intercepting thr_create (Solaris threads). It thus assumed that no threads had been created, even though they had. So Thomson Reuters’ multithreaded code was running with the spinlocks disabled.

No wonder it crashed. It’s surprising it worked at all.

So, now Hoard now properly intercepts thr_create. Bug fixed. Life is good.

That said, I still think that exposing programmers to the vagaries of hardware memory models should be a felony offense.

Memory Models – Fuhgeddaboutit, Part 1

As promised, a brief article on Hoard. (For those of you who do not know, Hoard is a highly scalable, high-performance malloc replacement that gets a fair amount of use in The Real World. I wrote it and maintain it.)

You’ll see the reason for the title in a minute.

Thomson Reuters uses Hoard in their servers. They are currently using Hoard 2.1, which is kind of old (I released it in December 2001). The new version, Hoard 3.7.1, incorporates tons of improvements which I should probably write up someday, since people keep referencing the original Hoard paper from ASPLOS 2000, which was already out of date in 2001!

Anyway, while evaluating Hoard, the good folks at Thomson Reuters discovered that the new version of Hoard occasionally leads to crashes on their Sun systems.

You never know with these things, because a new memory allocator often shakes out application bugs that previously were latent (e.g., buffer overflows that now trash different things). But they came up with clearly bug-free code that reproduces the error, which they sent me. Convincing, and stupendously useful.

I tested the code on our big Niagara box, and sure enough, boom. But Hoard is in daily use by tons of folks, in lots of servers all around the world. What’s going on?

Even with the code, it took some time to track down. I shipped Thomson Reuters a patched version today, and will soon release version 3.7.2, incorporating these fixes.

Hint — both Hans Boehm and I have this wrong (so I am in very good company). Oh yeah, the title is a bit of a clue.

Emery Berger's Blog

%d bloggers like this: