I’m Old, Part XII: Works for Me

 

While working on Adobe Acrobat, the worst bugs that I had to work on were the ones that had a mystical recipe from QA. When reproducing a bug went into 10 or more steps, you knew you were in trouble. On top of that, we would often get bugs that happened on the QA tech’s machine, but not on ours. This was not a surprise – in most cases, QA was running a release build of the code and engineers were running a debug build. Since the code size of the two builds are different, the memory layout is different. That alone is enough to produce a “works for me” situation. It doesn’t help that the code flow is subtly different in a release build than a debug build as well as register allocation for local variables.

Worse on top of worse is the set of bugs that include a complex recipe as well as “crashes sometimes”.

We had one QA engineer, Dina Sakahara, who was incredibly tenacious about cutting the number of steps down as well as turning a “crashes sometimes” bug into a “crashes every time”.

At one point, I was assigned a bug that I could not reproduce. Worse, once it had manifested, the bug had probably happened a long time ago. One class of bug in this category is a called a “heap smasher” or a Heisenbug. Code writes into a chunk of memory that it shouldn’t and damages a data structure. The damage isn’t always manifest until much later in time.

On the Macintosh, Acrobat 1.0 was written in a combination of C and C++. Well, actually, the compiler we were using for the Mac wasn’t fully C++. It was close, though.

C, like most high level languages, allocates memory for local variables on the machine stack. This is very convenient and takes a scant few instructions to do on most architectures. One problem is that C doesn’t initialize local variables by default. This means that if you use a variable before it’s been initialized, you have garbage in your variable. Maybe. Sometimes, it might be conveniently correct. Like in the debug build, for example. Sometimes, it might be inconveniently incorrect. Like in the release build, for example. The end result is that you might end up shotgunning the heap in release, but not in debug.

I had one of these bugs and I spent 3 weeks trying to narrow it down and was making no progress at all. Crap.

While I thought through what causes this class of problem, what I wanted was a way to change the semantics of C such that local variables were initialized to known values.

Looking over the compiler documentation, I read up on it’s ability to do code profiling. Profiling is a way to do measurement of how long each function takes as well as how often it gets called. The compiler had the ability to inject measurement code into every function. Even better, it had the ability to make that code replaceable with custom code. Ha! The game is afoot!

I wrote a custom profiler that did no profiling at all. Instead, when it was called to start profiling a function, it would execute some code that I crafted that would walk the stack back to the function, then it found the code that set up local variables and wrote carefully crafted garbage into that space. I think I repeatedly wrote 0xdeadbeef into that space. If a local variable was a pointer to memory, the moment that it got used, the program would crash hard. I also initialized all the registers to 0xdeadbeef.

When I unleashed this code on the debug build, I found crash after crash. Some was in Macintosh only code, some in shared code. After fixing the problems and checking the code in, my bug went away as well as a bunch of other bugs assigned to other engineers on both Mac and Windows and DOS.

I wasn’t happy, though. I didn’t really have proof that I fixed the bug I was after. I know I fixed many bugs, but I couldn’t ever say that this was one of them.

At this point, most programming languages have a solution to this particular problem:

  1. Issue an optional warning on use of uninitialized data
  2. Forbid the use of uninitialized data
  3. Initialize anything not specified otherwise to 0’s

One of the reasons this was such a problem in C was because of the specification of where local variables live within a function.

In both C and Pascal, local variables are declared immediately after a block start and before lines of code. This makes the language grammar a little less complicated and often makes the compiler easier to write, but the cost is brutal in terms of bugs.

C++ (and most other later languages) let you declare variables almost anywhere in the code stream. This is a good thing – having a variable declaration closer to use reduces bugs as well as makes refactoring easier.

Still, I have a soft spot for C. It is a minimal language and was designed to map onto most hardware very directly. It is such that at that time, I could look at a chunk of code and not only figure out if the source language was C or Pascal, but I could also tell you which of the major compilers were used for it.

In my career, I enjoy having these flashes of insight to solve problems. I only wish they didn’t have to come with the brutal overhead of weeks of searching for a bug.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.