I’m Old, Part XVI: One of Many Screw Ups

If you write code, you write bugs. It’s inevitable; it’s human. It’s what happens after that’s important.

fuu

As I got to be a more experienced programmer, I learned habits to catch bugs either by the compiler (ideally), or very quickly in the test cycle. Mind you, the tools have gotten much better and there is less reliance on doing things like writing custom or sub-allocators, writing your own link list or growable array code and so on.

The first printer I worked on at Adobe, for DEC, had a chunk of EEROM/NVRAM on the controller. It had something like 128 bytes available and I didn’t own all of them. Since we were creating a PostScript cartridge, we had to share the memory with the hosting controller. As the project went along, there were a number of things that got added to the NVRAM, not the least was the serial port configuration, which got complicated since DEC wanted a bazillion baud rates, stops bits, parity, handshaking, and so on.

After all of the changes, I had coded a very subtle bug into the NVRAM handler. The read/write code worked just fine if the NVRAM had been initialized, but if it had never been initialized, the printer would execute a dreaded routine in the PostScript code base called CantHappen().

If you ever hit that routine in a debug build, the printer dropped into it’s low-level monitor and you could walk the stack to figure out how you got there. If you hit that routine in a release build, the printer rebooted.

Can you see the problem?

DEC sent the final accepted code off to a facility to make masked ROMs, which is an expensive process with a long lead time, but in quantity, it’s far cheaper than using EPROMs or One Time Programmable (OTP) devices/cartridges. DEC had gotten the first shipment of masked ROMs built into cartridges and when plugged into a printer, the printer would boot and start PostScript. The NVRAM was uninitialized and this triggered the failure. The code executed CantHappen(), then rebooted and the cycle repeated.

DEC called us and they were, understandably, at their wits’ end. They had no idea what had happened and what was going on.

It took me a couple days to figure it out and once we figured it out, the negotiations began. We explored all the options. DEC needed to get the product pipeline going and waiting for more masked ROMs was not in the cards. We talked about covering a set of OTP cartridges that they could use until more ROMs were burned – that was possible, but not ideal because OTPs of that size were costly.

I suggested an alternative – not ideal, but it worked. I could write a “special” version of PostScript that had no PostScript, but when it booted, it would detect uninitialized NVRAM and write to it in a way that the faulty cartridges would accept it, then put a message on the front panel letting the user know that the printer was ready. It was awkward and confusing for the user, but they could get this code into the smallest OTP and add it into the box with a note to use the initialization cartridge first. Adobe offered to cover the cost of the OTP cartridges for DEC until a new set of masked ROMs could be made.

Because of this, I made our printer QA rather unhappy. Whenever a new class of bug is found, printer QA gets another task in their long list of things to do. In this case, part of every acceptance test for every printer from then on that had NVRAM included “remove the NVRAM and put in a new chip, boot the printer, ensure the NVRAM has been properly initialized.”

While I worked on the “fix it” cartridge, I had a paper cup on my desk of “used” NVRAM. They were perfectly good chips, but they all had been initialized and were unsuitableĀ for my testing. It didn’t take me too long to write the fix, but it did take time to go through both our and their QA departments.

Nobody foresaw this bug and I don’t know if there was a good way to have done so. I think the takeaway from this situation is to do what we did: understand the nature of the bug, list out all the solutions and pick the one that works best for everyone in the circumstances.

I’m not proud of creating the problem, but I was happy with the solution. DEC was not happy with the bug, but I think they were happy with how Adobe management handled the situation. They were happy enough to continue a working relationship with Adobe and happy enough to have me on their next product.

But that’s a story for another day.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.