Category Archives: Memory Management

Most Influential Paper of OOPSLA 2002: “Reconsidering Custom Memory Allocation”

Our paper, Reconsidering Custom Memory Allocation, was just granted the Most Influential OOPSLA Paper award (given ten years after the paper appeared). Here’s the citation for the award.Influential-Paper-OOPSLA

Custom memory management is often used in systems software for the purpose of decreasing the cost of allocation and tightly controlling memory footprint of software. Until 2002, it was taken for granted that application-specific memory allocators were superior to general purpose libraries. Berger, Zorn and McKinley’s paper demonstrated through a rigorous empirical study that this assumption is not well-founded, and gave insights into the reasons why general purpose allocators can outperform handcrafted ones. The paper also stands out for the quality of its empirical methodology.

I am grateful to OOPSLA not only for the check for $333.33 :), but also for giving me the chance to publicly stand up and thank my wonderful co-authors: my excellent colleague Ben Zorn and my awesome advisor, Kathryn McKinley (both now at Microsoft Research). The original paper actually did a bit more than the citation – here’s the abstract from the original paper.

Programmers hoping to achieve performance improvements often use custom memory allocators. This in-depth study examines eight applications that use custom allocators. Surprisingly, for six of these applications, a state-of-the-art general-purpose allocator (the Lea allocator) performs as well as or better than the custom allocators. The two exceptions use regions, which deliver higher performance (improvements of up to 44%). Regions also reduce programmer burden and eliminate a source of memory leaks. However, we show that the inability of programmers to free individual objects within regions can lead to a substantial increase in memory consumption. Worse, this limitation precludes the use of regions for common programming idioms, reducing their usefulness.

We present a generalization of general-purpose and region-based allocators that we call reaps. Reaps are a combination of regions and heaps, providing a full range of region semantics with the addition of individual object deletion. We show that our implementation of reaps provides high performance, outperforming other allocators with region-like semantics. We then use a case study to demonstrate the space advantages and software engineering benefits of reaps in practice. Our results indicate that programmers needing fast regions should use reaps, and that most programmers considering custom allocators should instead use the Lea allocator.

Slight correction: they should instead use Hoard :).

Software Needs Seatbelts and Airbags

(This post is a draft version of an article slated to appear in ACM Queue.)

Finding and fixing bugs in deployed software is difficult and time-consuming: here are some alternatives.

Death, Taxes, and Bugs.

Like death and taxes, buggy code is an unfortunate fact of life. Nearly every program ships with known bugs, and probably all of them end up with bugs that are only discovered post-deployment. There are many reasons for this sad state of affairs.

Unsafe Languages.

One problem is that many applications are written in memory-unsafe languages. Variants of C, including C++ and Objective-C, are especially vulnerable to memory errors like buffer overflows and dangling pointers (use-after-free bugs). These bugs, which can lead to crashes, erroneous execution, and security vulnerabilities, are notoriously challenging to repair.

Safe Languages: No Panacea.

Writing new applications in memory-safe languages like Java instead of C/C++ would go some way towards mitigating these problems. For example, because Java uses garbage collection, Java programs are not susceptible to use-after-free bugs; similarly, because Java always performs bounds-checking, Java applications cannot suffer memory corruption due to buffer overflows.

That said, safe languages are no cure-all. Java programs still suffer from buffer overflows and null pointer dereferences, though they throw an exception as soon as they happen, unlike their C-based counterparts. The common recourse to these exceptions is to abort execution and print a stack trace (even to a web page!). Java is also just as vulnerable as any other language to concurrency errors like race conditions and deadlocks.

There are both practical and technical reasons not to use safe languages. First, it is generally not feasible to rewrite existing code because of the cost and time involved, not to mention the risk of introducing new bugs. Second, languages that rely on garbage collection are not a good fit for programs that need high performance or which make extensive use of available physical memory, since garbage collection always requires some extra memory [5]. These include OS-level services, database managers, search engines, and physics-based games.

Are Tools the Answer?

While tools can help, they too cannot catch all bugs. Static analyzers have made enormous strides in recent years, but many bugs remain out of reach. Rather than swamp developers with false positive reports, most modern static analyzers report far fewer bugs than they could. In other words, they trade false negatives (failing to report real bugs) for lower false positive rates. That makes these tools more usable, but also means that they will fail to report real bugs. Dawson Engler and his colleagues made exactly this choice for Coverity’s “unsound” static analyzer (see the Communications of the ACM article on their experiences: A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World [4].

Testing is Good but Not Enough.

The state of the art in testing tools has also advanced dramatically in the last decade. Randomized fuzz testing can be combined with static analysis to drive tests to explore paths that lead to failure. These tools are now in the mainstream: for example, Microsoft’s Driver Verifier can test device driver code for a wide variety of problems, and now includes randomized concurrency stress testing.

But as Dijkstra famously remarked, “Program testing can be used to show the presence of bugs, but never to show their absence!” At some point, testing will fail to turn up new bugs, which will unfortunately be discovered only once the software has shipped.

Fixing Bugs: Risky (and Slow) Business.

Finding the bugs is only the first step. Once a bug is found, whether by inspection, testing, or analysis, fixing it remains a challenge. Any bug fix must be undertaken with extreme care, since any new code runs the risk of introducing yet more bugs. Developers must construct and carefully test a patch to ensure that it fixes the bug without introducing any new ones. This process can be costly and time-consuming. For example, according to Symantec, the average time between the discovery of a remotely exploitable memory error and the release of a patch for enterprise applications is 28 days [11].

Cut Bait and Ship.

At some point, it simply stops making economic sense to fix certain bugs. Tracking their source is often difficult and time consuming, even when the full memory state and all inputs to the program are available. Obviously, show-stopper bugs must be fixed. For other bugs, the benefits of fixing them may be outweighed by the risks of creating new bugs and the costs in programmer time and in delayed deployment.

Debugging at a Distance.

Once that faulty software has been deployed, the problem of chasing down and repairing bugs becomes exponentially more difficult. Users rarely provide detailed bug reports that allow developers to reproduce the problem.

For deployed software on desktops or mobile devices, getting enough information to find a bug can be difficult. Sending an entire core file is generally impractical, especially over a mobile connection. Typically, the best one can hope for is some logging messages and a minidump consisting of a stack trace and information about thread contexts.

Even this limited information can provide valuable clues. If a particular function appears on many stack traces observed during crashes, then that function is a likely culprit. Microsoft Windows includes an application debugger (formerly “Watson”, now “Windows Error Reporting”) that is used to perform this sort of triage not only for Microsoft but also for third-party applications via Microsoft’s Winqual program. Google also has made available a cross-platform tool called Breakpad that can be used to provide similar services for any application.

However, for many bugs, the kind of information that these tools provide is of limited value. For example, memory corruption errors often trigger failures millions of instructions past the point of the actual error, making stack traces useless. The same is generally true for null dereference exceptions, where the error often happens long after the null pointer was stored.

Captain’s Log: Not Enough Information.

On servers, the situation is somewhat improved. Server applications typically generate log messages, which may contain clues to why a program failed. Unfortunately, log files can be unmanageably large. Poring over logs and trying to correlate them to the source code can be extremely time-consuming. Even worse, that work may yield no useful results because the logs are incomplete; that is, the logs simply may not provide enough information to narrow down the source of a particular error because there were not enough or the right kind of log messages. Recent work at Illinois and UC San Diego may lead to tools that address some of these problems; SherLog [12] automates the process of tracing back bugs from log messages to buggy source code paths, and LogEnhancer [13] automatically extends log messages with information to help post-crash debugging. More information on logging appears in a recent Queue article, Advances and Challenges in Log Analysis [10].

God Doesn’t Play Dice, but Your Computer Does.

Despite these advances, finding bugs has actually become harder than ever. Back when many programs were sequential it was already challenging to find bugs, but now the situation has become far worse. Multithreaded programs, asynchrony, and multiple cores are now a fact of life. Every execution of these non-deterministic programs is completely different from the last because of different timing of events and thread interleavings. This situation makes reproducing bugs impossible even with a complete log of all input events—something that would be too expensive to record in practice anyway.

Bumpers, Seatbelts and Airbags.

Let’s shift gears for a moment to talk about cars (we’ll get back to talking about software in a minute). As an analogy for the situation we find ourselves in, consider when cars first entered onto the scene. For years, safety was an after-thought at best. When designing new cars, the primary considerations were aesthetics and high performance (think tailfins and V-8 engines).

Eventually, traffic fatalities led legislators and car manufacturers to take safety into account. Seatbelts became required standard equipment in US cars in the late 1960s, bumpers in the 1970s, and airbags in the 1980s. Modern cars incorporate a wide range of safety features, including laminated windshields, crumple zones, and anti-lock braking systems. It is now practically unthinkable that anyone would ship a car without these essential safety features.

However, we routinely ship software with no safety measures of any kind. We are in a position similar to that of the automobile industry of the 1950s, delivering software with lots of horsepower and tailfins, complete with decorative spikes on the steering column to make sure that the user will suffer if their application crashes.

Drunk-Driving Through a Minefield.

The potent cocktail of manual memory management mixed with unchecked memory accesses makes C and C++ applications susceptible to a wide range of memory errors. These errors can cause programs to crash or produce incorrect results. Attackers are also frequently able to exploit these memory errors to gain unauthorized access to systems. Since the vast majority of objects accessed by applications are on the heap, heap-related errors are especially serious.

Numerous memory errors happen when programs incorrectly free objects. Dangling pointers arise when a heap object is freed while it is still live, leading to use-after-free bugs. Invalid frees happen when a program deallocates an object that was never returned by the allocator by inadvertently freeing a stack object or an address in the middle of a heap object. Double frees are when a heap object is deallocated multiple times without an intervening allocation. This error may at first glance seem innocuous but, in many cases, leads to heap corruption or program termination.

Other memory errors have to do with the use of allocated objects. When an object is allocated with a size that is not large enough, an out-of-bound error can occur when the memory address to be read or written lies outside the object. Out-of-bound writes are also known as buffer overflows. Uninitialized reads happen when a program reads memory that has never been initialized; in many cases, uninitialized memory contains data from previously-allocated objects.

Airbags for Your Applications.

Given that we know we will be shipping software with bugs and that the terrain is dangerous, it might make sense to equip it with seatbelts and airbags. What we’d like is to have both resilience and prompt corrective action for any problem that surfaces in our deployed applications.

Let’s focus on C/C++/Objective-C applications—the lion’s share of applications running on servers, desktops, and mobile platforms—and memory errors, the number one headache for applications written in these languages. Safety-equipped memory allocators can play a crucial role in helping to protect your software against crashes.

The Garbage Collection Safety Net.

Dealing with the first class of errors—those that happen because of the misuse of free or delete—can be remedied directly by using garbage collection. Garbage collection works by only reclaiming objects that it allocated, eliminating invalid frees. It only reclaims objects once there is no way to reach those objects anymore by traversing pointers from the “roots”: the globals and the stack. That eliminates dangling pointer errors, since by definition there can’t be any pointers around to reclaimed objects. Since it naturally only reclaims these objects once, a garbage collector also eliminates double frees.

While C and C++ were not designed with garbage collection in mind, it is possible to plug in a “conservative” garbage collector and entirely prevent free-related errors. The word “conservative” here means that because the garbage collector doesn’t necessarily know what values are pointers (since we are in C-land), it conservatively assumes that if a value looks like a pointer (it is in the right range and properly aligned), and it acts like a pointer (it only points to valid objects), then it may be a pointer.

The Boehm-Demers-Weiser conservative garbage collector is an excellent choice for this purpose: it is reasonably fast and space-efficient, and can be used to directly replace memory allocators by configuring it to treat calls to free as NOPs.

Slipping Through the Net.

While garbage collectors eliminate free-related errors, they cannot help prevent the second class of memory errors: those that have to do with the misuse of allocated objects such as buffer overflows.

Runtime systems that can find buffer overflows often impose staggeringly high overheads, making them not particularly suitable for deployed code. Tools like Valgrind’s MemCheck are incredibly comprehensive and useful, but are heavyweight by design and slow execution by orders of magnitude [7].

Compiler-based approaches can reduce overhead substantially by avoiding unnecessary checks, though they entail recompiling all of an application’s code, including libraries. Google has recently made available AddressSanitizer, a combination of compiler and runtime technology that can find a number of bugs, including overflows and use-after-free bugs. While it is much faster than Valgrind, its overhead remains relatively high (around 75%), making it primarily useful for testing.

Your Program Has Encountered An Error. Goodbye, Cruel World.

All of these approaches are based on the idea that the best thing to do upon encountering an error is to abort immediately. This fail-stop behavior is certainly desirable in testing. However, it is not usually what your users want. Most application programs are not safety-critical systems, and aborting them in midstream can be an unpleasant experience for users.

Suppose you have been working on a Microsoft Word document for hours (and for some mysterious reason, auto-save has not been turned on). If Microsoft Word suddenly discovers that some error has occurred, what should it do? It could just pop up the window indicating that something terrible has happened and would you like it to send a note home to Redmond. That might be the best thing to do from a debugging perspective, but most people would prefer that Word do its damndest to save the current document rather than fall on its sword if it discovers a possible error. In short, users generally would prefer that their applications be fault tolerant whenever possible.

Bohr versus Heisenberg.

In fact, the exact behavior users do not want is for an error to happen consistently and repeatably. In his classic 1985 article “Why do computers stop and what can be done about it”, Jim Gray drew a distinction between two kinds of bugs. The first kind are bugs that behave predictably and repeatably—that is, ones that occur every time that the program encounters the same inputs and goes through the same sequence of steps. These are Bohr bugs, by analogy with the classical atomic model where electrons circle around the nucleus in planetary-like orbits. Bohr bugs are great when debugging a program, since it makes it easier to reproduce the bug and find its root cause.

The second kind of bugs are Heisenbugs, meant to connote the inherit uncertainty in quantum mechanics, which are unpredictable and cannot be reliably reproduced. The most common Heisenbugs these days are concurrency errors, a.k.a. race conditions, which depend on the order and timing of scheduling events to appear. Heisenbugs are also often sensitive to the observer effect; attempts to find the bug by inserting debugging code or running in a debugger often disrupt the sequence of events that led to the bug, making it go away.

Jim Gray makes the point that while Bohr bugs are great for debugging, what users want are Heisenbugs. Why? Because a Bohr bug is a showstopper for the user: every time the user does the same thing, they will encounter the same bug. But with Heisenbugs, the bugs often go away when you run the program again. If a program crashes, and the problem is a Heisenbug, then running the program again is likely to work. This is a perfect match for the way users already behave on the Web. If they go to a web page and it fails to respond, they just click “refresh” and that usually solves the problem.

So one way we can make life better for users is to convert Bohr bugs into Heisenbugs, if we can figure out how to do that.

Defensive Driving with DieHard.

My graduate students at the University of Massachusetts Amherst and I, in collaboration with my colleague Ben Zorn at Microsoft Research, have been working for the past few years on ways to protect programs from bugs. The first fruit of that research is a system called DieHard that makes memory errors less likely to impact users. DieHard eliminates some errors entirely and converts the others into (rare) Heisenbugs.

To explain how DieHard works, let’s go back to the car analogy. One way to make it less likely for cars to crash into each other is for them to be spaced further apart, providing adequate braking distance in case something goes wrong. DieHard provides this “defensive driving” by taking over all memory management operations and allocating objects in a space larger than required.

This de facto padding increases the odds that a small overflow will end up in un- allocated space where it can do no harm. However, DieHard doesn’t just add a fixed amount of padding between objects. That would provide great protection against overflows that are small enough, and zero protection against the others. In other words, those overflows would still be Bohr bugs.

Instead, DieHard provides probabilistic memory safety by randomly allocating objects on the heap. DieHard adaptively sizes its heap be a bit larger than the maximum needed by the application; the default is 1/3 [2, 3]. DieHard allocates memory from increasingly large chunks that we call miniheaps.

By randomly allocating objects across all the miniheaps (see this diagram for a detailed view), DieHard makes many memory overflows benign, with a probability that naturally declines as the overflow increases in size and as the heap becomes full. The effect is that, in most cases when running with DieHard, a small overflow is likely to have no effect.

DieHard’s random allocation approach also reduces the likelihood of the free-related errors that garbage collection addresses. DieHard uses bitmaps, stored outside the heap, to track allocated memory. A bit set to ’1’ indicates that a given block is in use, and ’0’ that it is available.

This use of bitmaps to manage memory eliminates the risk of double frees, since resetting a bit to zero twice is the same as resetting in once. Keeping the heap metadata separate from the data in the heap makes it impossible to inadvertently corrupt the heap itself.

Most importantly, DieHard drastically reduces the risk of dangling pointer errors, which effectively go away. If the heap has one million freed objects, the chances that you will immediately reuse one that was just freed is literally one in a million. Contrast this with most allocators, which immediately reclaim freed objects. With DieHard, even after 10,000 reallocations, there is still a 99% chance that the dangled object will not be reused.

Because it performs its allocation in (amortized) constant time, DieHard can provide added safety with very little additional cost in performance. For example, using it in a browser results in no perceivable performance impact.

Tolerating Faults FTW with FTH.

At Microsoft Research, tests with a variant of DieHard resolved about 30% of all bugs in the Microsoft Office database, while having no perceivable impact on performance. Beginning with Windows 7, Microsoft Windows now ships with a Fault-Tolerant Heap (FTH) that was directly inspired by DieHard. 8 Normally, applications use the default heap, but after a program crashes more than a certain number of times, the Fault-Tolerant Heap takes over. Like DieHard, the Fault-Tolerant Heap manages heap metadata separately from the heap. It also adds padding and delays allocations, though it does not provide DieHard’s probabilistic fault tolerance because it does not randomize allocations or deallocations. The Fault-Tolerant Heap approach is especially attractive because it acts like an airbag: effectively invisible and cost-free when everything is fine, but providing protection when they need it.

Exterminating the Bugs.

Tolerating bugs is one way to improve the effective quality of deployed software. It would be even better if somehow the software could not only tolerate faults but also correct them. A follow-on to DieHard, called Exterminator, does exactly that [8, 9]. Exterminator uses a version of DieHard extended to detect errors, and uses statistical inference to compute what kind of error happened and where the error occurred. Exterminator not only can send this information back to programmers for them to repair the software, but it also automatically corrects the errors via runtime patches. For example, if it detects that a certain object was responsible for a buffer overflow of 8 bytes, it will always allocate such objects (distinguished by their call site and size) with an 8-byte pad. Exterminator can learn from the results of multiple runs or multiple users, so it could be used to proactively push out patches to prevent other users from experiencing errors it has already detected elsewhere.

The Future: Safer, Self-Repairing Software.

My group and others (notably Martin Rinard at MIT, Vikram Adve at Illinois, Yuanyuan Zhou at UC-San Diego, Shan Lu at Wisconsin, and Luis Ceze and Dan Grossman at Washington) have made great strides in building safety systems for other classes of errors. We have recently published work on systems that prevent concurrency errors, some of which we can eliminate automatically. Grace is a runtime system that eliminates concurrency errors for concurrent programs that use “fork-join” parallelism. It hijacks the threads library, converting threads to processes “under the hood”, and uses virtual memory mapping and protection to enforce behavior that gives the illusion of a sequential execution, even on a multicore processor [1]. Dthreads (“Deterministic Threads”) is a full replacement for the POSIX threads library that enforces deterministic execution for multithreaded code [6]. In other words, a multithreaded program running with dthreads never has races; every execution with the same inputs generates the same outputs.

We look forward to a day in the not too distant future when such safer runtime systems are the norm. Just as we can now barely imagine cars without their myriad of safety features, we are finally adopting a similar philosophy for software. Buggy software is inevitable, and when possible, we should deploy safety systems that reduce their impact on users.


  1. E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for C/C++. In S. Arora and G. T. Leavens, editors, OOPSLA, pages 81–96. ACM, 2009.
  2. E. D. Berger and B. G. Zorn. DieHard: Probabilistic memory safety for unsafe languages. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 158–168, New York, NY, USA, 2006. ACM Press.
  3. E. D. Berger and B. G. Zorn. Efficient probabilistic memory safety. Technical Report UMCS TR-2007-17, Department of Computer Science, University of Massachusetts Amherst, Mar. 2007.
  4. A.Bessey, K.Block, B.Chelf, A.Chou, B.Fulton, S.Hallem, C.Henri-Gros, A.Kamsky, S. McPeak, and D. Engler. A few billion lines of code later: using static analysis to find bugs in the real world. Commun. ACM, 53(2):66–75, Feb. 2010.
  5. M. Hertz and E. D. Berger. Quantifying the performance of garbage collection vs. explicit memory management. In OOPSLA ’05: Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, pages 313–326, New York, NY, USA, 2005. ACM Press.
  6. T.Liu, C.Curtsinger, and E.D.Berger. Dthreads: Efficient Deterministic Multithreading. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP ’11, pages 327–336, New York, NY, USA, 2011. ACM.
  7. N. Nethercote and J. Seward. Valgrind: A framework for heavyweight dynamic binary instrumentation. In Proceedings of 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 89–100. ACM Press, June 2007.
  8. G. Novark, E. D. Berger, and B. G. Zorn. Exterminator: automatically correcting memory errors with high probability. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 1–11, New York, NY, USA, 2007. ACM Press.
  9. G. Novark, E. D. Berger, and B. G. Zorn. Exterminator: Automatically correcting memory errors with high probability. Communications of the ACM, 51(12):87–95, 2008.
  10. A. Oliner, A. Ganapathi, and W. Xu. Advances and challenges in log analysis. Commun. ACM, 55(2):55–61, Feb. 2012.
  11. Symantec. Internet security threat report., Sept. 2006.
  12. D. Yuan, H. Mai, W. Xiong, L. Tan, Y. Zhou, and S. Pasupathy. Sherlog: error diagnosis by connecting clues from run-time logs. In Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, ASPLOS ’10, pages 143–154, New York, NY, USA, 2010. ACM.
  13. D. Yuan, J. Zheng, S. Park, Y. Zhou, and S. Savage. Improving software diagnosability via log enhancement. ACM Trans. Comput. Syst., 30(1):4:1–4:28, Feb. 2012.

Best. SPEC configuration flag. Ever!

hoard-logo-small-Hoard” — enabling the use of my Hoard memory allocator — is now an officially sanctioned configuration flag for SPEC CPU2006 (the industry-standard way to measure CPU performance)! See the flags in use for the Intel compiler and Open64. My opinions about benchmarking notwithstanding, I am ok with my work being a standard configuration flag :).

The Evolution is Televised

I was just at POPL and got a very nice compliment on a talk I gave on Grace from someone who watched it on-line (!). That talk was the first in my on-going efforts to eliminate all text from my slides. My latest talk has no text whatsoever, except for the titles (not here, though – haven’t given it at MSR!). Anyway, thanks to Microsoft, you can watch the evolution (talks ordered from most recent to oldest).

Sheriff: Detecting and Eliminating False Sharing

Grace: Safe Multithreaded Programming for C/C++ (paper)

Exploiting Multiple Cores Today: Scalability and Reliability For Off-the-shelf Software (Flux, DieHard)

Garbage Collection without Paging (paper)

I’m a Mac (or, “Emery Inside”)

I’m a Mac (though I prefer John Hodgman)

I used to be a PC guy, but have completely gone Mac (MacBook Air, Mac Mini, iPhone, iPad, Jobs Distortion Field Glasses, etc.). But Mac went Emery before Emery went Mac! Proof below:


Multithread enhancements for “tiny” allocations introduced February 2008.
These are in the spirit of “Hoard”. See:
Berger, E.D.; McKinley, K.S.; Blumofe, R.D.; Wilson, P.R. (2000).
“Hoard: a scalable memory allocator for multithreaded applications”.
ACM SIGPLAN Notices 35 (11): 117-128. Berger2000.

Retrieved on 2008-02-22.

Winning the War on Bugs

This is a draft version of an article to appear in our departmental newsletter, Significant Bits (with links added).

Nearly all software ships with known bugs, and others are just lurking in the code waiting to be discovered. Some bugs are benign; for example, a page might not display correctly in a browser. But more serious bugs cause programs to crash unexpectedly or leave them vulnerable to attack by hackers. These bugs are difficult for programmers to find and fix. Even when the bugs are critical and security-sensitive, it takes an average of one month between initial bug reports and the delivery of a patch.

Rather than waiting for programmers to fix their bugs, or for hackers to find and exploit them, Professor Emery Berger’s group is designing systems to make software bug-proof. These systems allow buggy programs to run correctly, make them resistant to attack, and even automatically find and fix certain bugs. This work, developed jointly with Ben Zorn at Microsoft Research, was an important influence on the design of the Fault-Tolerant Heap that today makes Windows 7 more resistant to errors.

Defending Against Bugs

Berger and Zorn first developed an error-resistant system called DieHard, inspired by the film featuring Bruce Willis as an unstoppable cop.

DieHard attacks the widespread problem of memory errors. Programs written in the C and C++ programming languages – the vast majority of desktop, mobile, and server applications – are susceptible to memory errors. These bugs can lead to crashes, erroneous execution, and security vulnerabilities, and are notoriously costly to repair.

Berger uses a real-estate analogy to explain the problem of memory errors. Almost everything done on a computer uses some amount of memory—each graphic on an open Web page, for example—and when a program is running, it is constantly “renting houses” (chunks of memory) to hold each item, and putting them back on the market when they are no longer needed. Each “house” has only enough square footage for a certain number of bytes.

Programmers can make a wide variety of mistakes when managing their memory. They can unwittingly rent out houses that are still occupied (a dangling pointer error). They can ask for less space than they need, so items will spill over into another “house” (a buffer overflow). A program can even place a house up for rent multiple times (a double free), or even try to rent out a house that doesn’t exist (an invalid free), leading to havoc when the renter shows up. These mistakes can make programs suddenly crash, or worse: they can make a computer exploitable by hackers.

The way “addresses” are assigned also makes computers vulnerable. Houses (memory locations) with especially desirable valuables, like passwords, will always be on the same lot on the same street. If hackers can locate a password once, they can easily locate the password’s address on anyone’s version of the same program.

DieHard attacks these problems in several ways. First, it completely prevents certain memory errors, like double and invalid frees, from having any effect. DieHard keeps important information, like which houses are rented and which are not (heap metadata), out of a hacker’s reach. Most importantly, DieHard randomly assigns addresses—a password that has a downtown address in one session may be in the suburbs next time around. This randomization not only adds security but also increases resilience to errors, reducing the odds that dangling pointer errors or small or moderate overflows will have any effect.

Exterminating the Bugs

While Professor Berger is more than pleased that the DieHard work has influenced the Windows 7 Fault-Tolerant Heap, he hopes that Microsoft will adopt the technology that Zorn, Berger, and his Ph.D. student Gene Novark designed next, called Exterminator. Exterminator not only finds errors but also automatically fixes them. Exterminator uses a variant of DieHard (called DieFast) that constantly scans memory looking for signs of errors. DieFast places “canaries” – specific random numbers – in unused memory. Just like in a coalmine, a “dead” canary means trouble. When DieFast discovers a dead canary, it triggers a report containing the state of memory.

Exterminator next applies forensic analysis to these reports. With information gleaned from several users running a buggy program, Exterminator can pinpoint the source and location of memory errors. From that point on, Exterminator protects the program from that error by “padding” buggy memory requests to prevent overflows, and delaying premature relinquishing of memory to prevent dangling pointer errors.

Berger notes that since Microsoft already gathers information when programs crash, using techniques similar to those in Exterminator would be a natural next step to quickly find and fix memory errors.

Professor Berger is now tackling the problem of concurrency errors – bugs that are becoming more common with the widespread adoption of multicore CPUs. His group recently developed Grace, a system that prevents concurrency errors in C and C++ programs, and Berger hopes that some version of it will also gain widespread adoption as part of an arsenal to protect programs from bugs.

New Hoard release

I  just released a new version of Hoard that incorporates lots of changes (one of which I described in an earlier post).

This release incorporates a number of fixes and improvements, including:

  • Better per-thread allocation, improving speed for Unix platforms that do not support __thread (a fast way to do thread-specific data)
  • Added interception of Solaris threads API (thr_*) to work with older Solaris programs (the subject of the earlier post)
  • Fixed a rare race condition accidentally introduced in 3.7.1
  • Increased checking to protect against heap corruption or other errors
  • And Hoard now uses GNU-supported hooks on platforms with glibc (especially Linux), allowing it to work better with legacy programs (even emacs!).

I was wrong! And right!

Turns out I was wrong and right at the same time.

I thought that the problem Thomson Reuters had discovered (detailed in the last post) was that Hoard’s spinlock implementation for Sparc was somehow broken, possibly by a newer version of the Sparc architecture (e.g., by using a horrible relaxed memory model). See this info from Sun about such models, including TSO and (yikes) RMO.

Suffice it to say that relaxed memory ordering breaks things in addition to being absolutely awful to reason about. But luckily, apparently saner minds have prevailed and that memory ordering — while supported by the Sparc architecture — is never enabled. Phew.

Anyway, while chasing down the bug I discovered an “impossible” sequence of events (a race, but under the protection of a lock), and switching from spinlocks to POSIX locks (slower, but safe) solved the problem. Aha! Plainly, something wrong with the spinlock! But, it turns out, the spinlock code is perfectly fine. It’s practically identical to what Hans Boehm does in his atomics library.

Next time both Hans and I have done things the same way, I will assume that Hans probably got it right, so I’m OK. 🙂 The real source of the problem was elsewhere, but it points up the perils of supporting cross-platform software (and legacy systems).

First, a little background.

Hoard has an optimization specifically for single-threaded programs. Unless you spawn a thread, Hoard does not enable any locking code or atomic operations (locks become simple loads and stores). IIRC, this increases performance by several percentage points for some programs, so it’s nothing to sneeze at.

It’s simple enough: Hoard has a Boolean value called anyThreadCreated, which is initially false. The Hoard library intercepts the pthread_create call (on Windows, it does something else that has the same effect). Among other things, any call to create a thread immediately sets anyThreadCreated to true. The spinlock implementation then enables real locking.

As you can imagine, if somehow a program were to run multiple threads with locks disabled, that would be bad.

Enter Solaris threads.

It turns out that Solaris has not one but two thread APIs.  They were the predecessor to the now-familiar POSIX threads API. However, some code still uses this old, non-portable API. Notably, the code running at Thomson Reuters.

Yeah, I knew about Solaris threads, since I programmed with them “back in the day” (in the mid to late 90’s), but I overlooked them.

What was happening was that Hoard was not intercepting thr_create (Solaris threads). It thus assumed that no threads had been created, even though they had. So Thomson Reuters’ multithreaded code was running with the spinlocks disabled.

No wonder it crashed. It’s surprising it worked at all.

So, now Hoard now properly intercepts thr_create. Bug fixed. Life is good.

That said, I still think that exposing programmers to the vagaries of hardware memory models should be a felony offense.

Memory Models – Fuhgeddaboutit, Part 1

As promised, a brief article on Hoard. (For those of you who do not know, Hoard is a highly scalable, high-performance malloc replacement that gets a fair amount of use in The Real World. I wrote it and maintain it.)

You’ll see the reason for the title in a minute.

Thomson Reuters uses Hoard in their servers. They are currently using Hoard 2.1, which is kind of old (I released it in December 2001). The new version, Hoard 3.7.1, incorporates tons of improvements which I should probably write up someday, since people keep referencing the original Hoard paper from ASPLOS 2000, which was already out of date in 2001!

Anyway, while evaluating Hoard, the good folks at Thomson Reuters discovered that the new version of Hoard occasionally leads to crashes on their Sun systems.

You never know with these things, because a new memory allocator often shakes out application bugs that previously were latent (e.g., buffer overflows that now trash different things). But they came up with clearly bug-free code that reproduces the error, which they sent me. Convincing, and stupendously useful.

I tested the code on our big Niagara box, and sure enough, boom. But Hoard is in daily use by tons of folks, in lots of servers all around the world. What’s going on?

Even with the code, it took some time to track down. I shipped Thomson Reuters a patched version today, and will soon release version 3.7.2, incorporating these fixes.

Hint — both Hans Boehm and I have this wrong (so I am in very good company). Oh yeah, the title is a bit of a clue.