In the past 15 years, I’ve become a huge fan of unit tests. Whenever I start a new project, one of my first questions is “how can I unit test this?” And admittedly, many of my unit tests tend to be heavyweight and less unit and more macro.
For example, in working on Binding Tools for Swift, in order to test my code, I need tests that can write swift code, compile that swift code, run my tooling on it which generates an API, write C# code that exercises that API and prints out some useful indication that it ran. The actual test reads the output and compares it to an expected output. It’s not pretty, but it works. Or at least I have 1545 indications that it works so far.
From time to time, it’s been necessary to do a wide-reaching refactoring of the code base. It is at these times that I’m very happy that I have unit tests that exercise a wide range of my code. Recently, it became apparent that my old code for reflecting on compiled swift code, which is a custom build of the swift compiler, was not going to work as is. Fortunately, swift has added the ability to produce a text-only version of the front-facing API of the code: the swiftinterface file. This file contains a subset of the swift grammar, so I wrote a parser for it using ANTLR which generates (ostensibly) the same XML representation of reflection information as the compiler-based version. After a couple months of implementation and some basic testing (I have ~100 tests on the XML reflection code), I unleashed it on my whole test suite. I got about 800 failures.
After a day and a few PRs, I had that down to about 422 failures. After the next day, it was down to 212 failures. After the next day and more PRs, it was around 100. See the pattern? Every day I was reducing the failures by 1/2, which is akin to exponential decay. This is common enough that whenever I work on a refactoring like this, I start of up a spreadsheet with the number of failures over days. This does two things: confirmation bias of my hypothesis and more importantly it lets me see the progress over time.
Try it some time and see how you do.
Now let’s think about the why. What I think happens is that in reasonably factored code you get bottlenecks: places where there is a lot of code traffic. Code issues that affect bottlenecks are going to cause a lot of failures. Getting rid of an issue in a bottleneck will remove a large number of failures as well as opening up access to new failure bottlenecks (usually more removal of old than opening up of new). Eventually, when the bottlenecks have been cleaned out, you’re left with issues in the fringes.
That’s my story and I’m sticking with it.