Before I dig in to this, I want to talk about the C programming language. C is a truly impressive language in terms of how it is both a high level language and cleaves very closely to CPU architectures. If you’re writing system code, this is a very good thing because you predict fairly accurately what assembly language will get generated from any given block of C. It is not without its issues.
As the saying goes, “C combines the power and performance of assembly language with the flexibility and ease-of-use of assembly language.” which means that you need to be extra careful about a lot of things, especially memory and pointers.
It is unfortunately, easy to create all kinds of nasty bugs involving memory misuse and pointer misuse because, simple as the concepts may be, we’re human and we make mistakes. The most common mistakes are reading or writing a pointer that has not been initialized or reading or writing beyond the bounds of a particular block of memory. If you’re lucky, such an error will cause an immediate and reproducible failure. Unfortunately, these types of bugs can also create issues that can go undetected for a very long time.
This type of bug is sometimes called shotgunning the heap (or stack) or a heap smasher. If you can’t catch it at the point of inception, tracking it down is very challenging. I have tracked down many of these bugs in my career and none of them have been pleasant experiences. There are a number of tools that you can use, but most of them are shots in the dark.
Brian Fitzpatrick posted this excerpt about handling corrections for engineers with issues called “Shit Sandwich”. It reminds me of an old UNIX fortune which was something like “Life is like a shit sandwich, the more bread you’ve got, the less shit you have to eat.” But that’s not what he talks about. You should read his essay it’s good.
I will add to it the importance of communicating when you know that you are sending an engineer on a death march. In the past, I’ve been sent on these, where there is a heap smasher that needs to be tracked down and precious little to go on. It’s a process that can take weeks. And it’s weeks of trying things – trying to find the magic incantation to make the bug happen consistently, to try to narrow it down to the point of inception, to try to find the actual cause. It can be weeks of doing similar things repeatedly and making little progress.
The worst part of being sent on one of these, besides the frustration and drudgery, is when it’s not even been acknowledged. I’ve had several of those. Resented every single one. When I worked on Acrobat Catalog for the Mac, we had a heap smasher that only happened on PowerPC Macs when indexing files on network shares. I was sent on this one with no acknowledgement of the impending pain. Ultimately, it turned out to be not my code, but Apple’s implementation of TCP/IP. It took weeks to narrow it down.
When I was at Atalasoft, we had some awful bugs in that category. A few of them, I tracked down. Others, because of other reasons had to be handed to junior engineers. For example, we had a long standing bug that we suspected was in our JPEG2000 decoder, which we licensed from an outside company. The product generally worked fine, except that once in a while when our automated unit tests would show failures. Most of the time, the test would pass the next time around. Unfortunate. Sometimes they would fail consistently, at which point we would put someone on it to try to replicate the conditions so that we could contact the manufacturer with a test case.
When I had to assign an engineer to this kind of job, I tried to do a few things:
- I tried to explain that I knew what kind of task this was and its scope and that I understood all to well what they were in for
- I tried to offer as much support as I could to help out as a reference: how to use gflags, for example, or how to create memory pools to detect illegal writes, or how to work with WinDBG.
The point being, when you know you are sending an engineer off on a particularly challenging and/or unpleasant task, I think it’s important to communicate that you understand how awful it is and do your best to help out.