Category Archives: Memory Management

New Hoard release

I  just released a new version of Hoard that incorporates lots of changes (one of which I described in an earlier post).

This release incorporates a number of fixes and improvements, including:

  • Better per-thread allocation, improving speed for Unix platforms that do not support __thread (a fast way to do thread-specific data)
  • Added interception of Solaris threads API (thr_*) to work with older Solaris programs (the subject of the earlier post)
  • Fixed a rare race condition accidentally introduced in 3.7.1
  • Increased checking to protect against heap corruption or other errors
  • And Hoard now uses GNU-supported hooks on platforms with glibc (especially Linux), allowing it to work better with legacy programs (even emacs!).

I was wrong! And right!

Turns out I was wrong and right at the same time.

I thought that the problem Thomson Reuters had discovered (detailed in the last post) was that Hoard’s spinlock implementation for Sparc was somehow broken, possibly by a newer version of the Sparc architecture (e.g., by using a horrible relaxed memory model). See this info from Sun about such models, including TSO and (yikes) RMO.

Suffice it to say that relaxed memory ordering breaks things in addition to being absolutely awful to reason about. But luckily, apparently saner minds have prevailed and that memory ordering — while supported by the Sparc architecture — is never enabled. Phew.

Anyway, while chasing down the bug I discovered an “impossible” sequence of events (a race, but under the protection of a lock), and switching from spinlocks to POSIX locks (slower, but safe) solved the problem. Aha! Plainly, something wrong with the spinlock! But, it turns out, the spinlock code is perfectly fine. It’s practically identical to what Hans Boehm does in his atomics library.

Next time both Hans and I have done things the same way, I will assume that Hans probably got it right, so I’m OK. 🙂 The real source of the problem was elsewhere, but it points up the perils of supporting cross-platform software (and legacy systems).

First, a little background.

Hoard has an optimization specifically for single-threaded programs. Unless you spawn a thread, Hoard does not enable any locking code or atomic operations (locks become simple loads and stores). IIRC, this increases performance by several percentage points for some programs, so it’s nothing to sneeze at.

It’s simple enough: Hoard has a Boolean value called anyThreadCreated, which is initially false. The Hoard library intercepts the pthread_create call (on Windows, it does something else that has the same effect). Among other things, any call to create a thread immediately sets anyThreadCreated to true. The spinlock implementation then enables real locking.

As you can imagine, if somehow a program were to run multiple threads with locks disabled, that would be bad.

Enter Solaris threads.

It turns out that Solaris has not one but two thread APIs.  They were the predecessor to the now-familiar POSIX threads API. However, some code still uses this old, non-portable API. Notably, the code running at Thomson Reuters.

Yeah, I knew about Solaris threads, since I programmed with them “back in the day” (in the mid to late 90’s), but I overlooked them.

What was happening was that Hoard was not intercepting thr_create (Solaris threads). It thus assumed that no threads had been created, even though they had. So Thomson Reuters’ multithreaded code was running with the spinlocks disabled.

No wonder it crashed. It’s surprising it worked at all.

So, now Hoard now properly intercepts thr_create. Bug fixed. Life is good.

That said, I still think that exposing programmers to the vagaries of hardware memory models should be a felony offense.

Memory Models – Fuhgeddaboutit, Part 1

As promised, a brief article on Hoard. (For those of you who do not know, Hoard is a highly scalable, high-performance malloc replacement that gets a fair amount of use in The Real World. I wrote it and maintain it.)

You’ll see the reason for the title in a minute.

Thomson Reuters uses Hoard in their servers. They are currently using Hoard 2.1, which is kind of old (I released it in December 2001). The new version, Hoard 3.7.1, incorporates tons of improvements which I should probably write up someday, since people keep referencing the original Hoard paper from ASPLOS 2000, which was already out of date in 2001!

Anyway, while evaluating Hoard, the good folks at Thomson Reuters discovered that the new version of Hoard occasionally leads to crashes on their Sun systems.

You never know with these things, because a new memory allocator often shakes out application bugs that previously were latent (e.g., buffer overflows that now trash different things). But they came up with clearly bug-free code that reproduces the error, which they sent me. Convincing, and stupendously useful.

I tested the code on our big Niagara box, and sure enough, boom. But Hoard is in daily use by tons of folks, in lots of servers all around the world. What’s going on?

Even with the code, it took some time to track down. I shipped Thomson Reuters a patched version today, and will soon release version 3.7.2, incorporating these fixes.

Hint — both Hans Boehm and I have this wrong (so I am in very good company). Oh yeah, the title is a bit of a clue.