I’m Old, Part XL: When Cycles Mattered

When I was 20, I took a year off college and worked at Bell Communications Research. I was a contractor and worked in a group that was doing, primarily, human computer interfacing research. Within the department, most of the people worked on Sun 3 workstations which were running on a 68020 at about 16MHz with a 1 bit display. It came with a windowing system, SunTools, which ran like a dog.

This was a different era for computing. Men were real men. Women were real women. And small furry creatures from Alpha Centu…sorry. And every cycle mattered if you wanted an app that didn’t run like a dog.

There was an engineer in the group named Steve Uhler, who had a wonderfully refreshing love/hate attitude towards these machines. He couldn’t stand SunTools, so he wrote his own windowing system, MGR. It was built to squeeze as many cycles out of the CPU as he could manage without reverting to assembly and it had a very interesting mode of operation. Most windowing systems at the time were written so that they had to run natively on the CPU of the computer. X had ways of running on a different machine, but early on it didn’t work particularly well. Instead, MGR ran on your local system and programs that wanted to use graphics would use a set of macros that generated escape codes to perform graphics operations on the windowing system. Essentially, if you wanted to draw a line, you did a printf(). To make your life easier, Steve made a bunch of macros which did that for you. In addition to basic graphics, there were higher level routines that could make new windows, make menus, resize windows, move windows, and so on.

Under the hood, Steve had written some really intense C to make sure that this all happened very quickly. In his bit-blitting code, he had macros that unrolled the loops that in turn had macros that handled various bit-level operations (AND/OR/XOR/etc). The code ran very, very quickly and as a result the UI felt very snappy. In addition, Steve had an interesting solution to the “partially obscured window” problem. Most systems are built to not care about obscured windows and notify the app when they’ve been uncovered so the app can repaint the window. Not Steve. He kept a backing store for each window and all operations went into the backing store. When a window was exposed, he just blitted the backing store on top and ta-da, the window was there. His code also routinely broke C preprocessors. I mentioned early that Steve had a love/hate attitude and he loved to hate compilers. If the department got a new machine or there was a new better C compiler, Steve would break it and complain with a smile.

The cool thing was that if you had some serious number crunching to do, you could do it on a serious number crunching computer and #include the MGR header for spitting out the graphics. It worked astoundingly well.

I wrote a ton of code for MGR, including a UI front end to gnuchess, a higher level UI library for doing dialog boxes and control widgets, a tool to display the output of pic, Hinton diagram display code, and a raft of graphics hacks including a hack that had a bunch of walking eyeballs.

I also ported MGR to the Macintosh. That was interesting because, while Steve’s bit-blit code worked on the Mac, it didn’t run as well as native Mac apps. This shouldn’t have been a surprise. The Mac I used ran at 8MHz, not 16MHz and had a 16 bit data bus not a 32 bit data bus. Also the QuickDraw, the native graphics API was written in carefully tuned assembly language. So I found the bottlenecks into Steve’s bit-blit code and replaced them with QuickDraw calls and hooray everything worked and ran close to the same speed as the Sun.

And while I enjoyed shaving cycles in my code, I don’t miss it all that much. Yes, I know that software has bloated to fill memory and CPU capacity, but at the same time it’s very straightforward today to run a decent profiler, find the main hot spots and cool them off. And this can typically be done in a way that doesn’t compromise the readability or intent of your code.

The last time I talked to Steve, he was working on research projects related to the Java VM. He had the same love/hate attitude while he was talking about work that tracked local variables and code that he put in to make them get stack allocated instead of heap allocated so you wouldn’t burden the garbage collector with things that were short-lived and guaranteed to be collected and method exit.

Bresenham Line Algorithm in F#

I love the Bresenham line algorithm. It’s an old algorithm that’s been around for more than half a century since it was created to drive plotters. One of the cool things about it is that it requires nothing more than addition, subtraction and multiplication * 2 (usually done as a bit shift or an add). Another cool thing is that it lends itself quite nicely to being implemented in F# tail-recursively. I did this as a mental exercise in Rosetta coding to see what it would look like in F# (spoiler: it looks really nice). The reference code I started with was in C and can be found here (PDF), which has a pretty great description of the algorithm and how to optimize the living daylights out of it. The F# code is optimized for reading.

To go into this, I’m going to say that for the algorithm, there are 8 separate cases for 8 octants in the Cartesian plane, but a little planning, you can get this down to 2 octants (one x dominant, the other y dominant), and honestly, the two octants are nearly identical.

Since F# is a functional language, we’re going to solve the inside-out problem, by deferring the work that must be done at each pixel to an external function:

type PixelFunc = int -> int -> unit

which we will call with the (x, y) coordinates of a pixel that should be set. Easy.
Next, here’s internal xdominant octant case:

let Octant0 xdir =
let dy2 = dy * 2
let dy2dx2diff = dy2 - (dx * 2)
let rec Octant0Rec x y count error =
    if count = 0 then ()
        let (y0next, errornext) =
            if error >= 0 then (y + 1, error + dy2dx2diff)
            else (y, error + dy2)
        pf (x + xdir) y0next
        Octant0Rec (x + xdir) y0next (count - 1) errornext
pf x0 y0
Octant0Rec x0 y0 dx (dy2 - dx)

You should be aware that this internal function refers to a couple of variables that are in an outer scope: dx, which is the absolute value of Δx and dy is the absolute value of Δy and x0 and y0 are the starting pixel. Essentially, Octant0Rec will draw count pixels starting from an initial x and y, stepping along x (either positively or negatively) and conditionally staying on the same y or moving when the error gets too big. Octant0 sets up dy times two, and the difference between dx and dx times 2. So simple.
So now that we have that out of context, let’s put it in context with the whole routine:

let DrawLine ix0 iy0 ix1 iy1 (pf:PixelFunc) =
    let (x0, y0, x1, y1) =
        if iy0 > iy1 then (ix1, iy1, ix0, iy0)
        else (ix0, iy0, ix1, iy1)
    let tdx = x1 - x0
    let (dx, xinc) = if tdx < 0 then (-tdx, -1) else (tdx, 1)
    let dy = y1 - y0
    let Octant0 xdir =
        let dy2 = dy * 2
        let dy2dx2diff = dy2 - (dx * 2)
        let rec Octant0Rec x y count error =
            if count = 0 then ()
                let (y0next, errornext) =
                    if error >= 0 then (y + 1, error + dy2dx2diff)
                    else (y, error + dy2)
                pf (x + xdir) y0next
                Octant0Rec (x + xdir) y0next (count - 1) errornext
        pf x0 y0
        Octant0Rec x0 y0 dx (dy2 - dx)
    let Octant1 xdir =
        let dx2 = dx * 2
        let dx2dy2diff = dx2 - (dy * 2)
        let rec Octant1Rec x y count error =
            if count = 0 then ()
                let (x0next, errornext) =
                    if error >= 0 then (x + xdir, error + dx2dy2diff)
                    else (x, error + dx2)
                pf x0next (y + 1)
                Octant1Rec x0next (y + 1) (count - 1) errornext
        pf x0 y0
        Octant1Rec x0 y0 dy (dx2 - dy)
    if dx > dy then Octant0 xinc
    else Octant1 xinc

The first thing the set up code does is order the initial starting and ending (x,y) coordinates, then calculates the deltas and based on whether x or y is dominant, selects the appropriate octant code.

So we can see that you can implement Bresenham’s line algorithm in a way that is readable, tail-recursive, and uses nothing but immutable variables.

Swift Argument Passing Protocol

Something that is not particularly well-documented in the Swift project is how to call swift functions. Swift has a relatively simple argument passing protocol, but the simplicity is ruined by special cases that make it rather inconvenient to work with at times.

First, let’s start with the simplicity. Swift has two general ways to pass arguments: by reference or by value. Generally speaking, value types get passed by value and reference types get passed by reference. See, isn’t that easy?

So what is a reference type in Swift? Simple: any class is a reference type. What’s a value type? Simple, structs, enums, and scalar types (Bool, Int, Float, etc).

Now the special cases – value types get passed by value if and only if the size of the value type is 3 machine pointers or fewer, if they exceed that then a copy will get passed by reference. Why is this a special case? Because pretty much every current ABI for current processors says that value types are passed by value if the size of the value type is 2 machine pointers or fewer. Great. Thanks, Swift, you’re the best.

Now, when I say that values get passed – there’s more to it. Since Swift uses reference counting for its memory management, there is infrastructure that you need to care about quite a bit. Whenever a function gets called in Swift, the caller will bump the reference count of any reference types (or reference types contained within value types) and the callee with drop the reference count before returning. The problem with this, and this is an important one, is that if you are calling a Swift function, you must bump the reference count of everything you pass in. If you don’t, the Swift with dutifully decrement the reference count on the way out and may cause the object to get disposed out from underneath you.

OK great – supposing that I have a Swift data type, how do I make this happen? If you have a reference type, this is easy. In the library libSwiftCore, there is an exported function called swift_retain to do just this. To drop the count (if you were called by Swift), you would call swift_release. The problem (and trust me, there are problems every step of the way) is when you get called by Swift. You’re supposed to drop the reference count on the way out. Fortunately, swift_release is also there for you.

Great, we have swift_retain and swift_release in our tool belt. We’re all set, right? Nope. Like I mentioned, we care quite a bit about value types that contain reference types. You would like swift_retain and swift_release to work on value types too, but they don’t – and there’s a really good reason why. Value types are supposed to contain the minimum amount of information possible. For example, in Swift a Bool is really one bit (usually padded out to a byte or more). Swift doesn’t attach the overhead of object layout to value types. Instead, for every type in Swift, there are a couple of data structures available to the compiler/runtime that are used for that information. The main three are the Metatype (or Metadata), Nominal Type Descriptor, and Value Witness Table.  Between these three blocks of data in Swift, you can find the following information:

  • Fields
  • Field Types
  • Size of the type
  • How to manipulate the data type

To get the Metatype of any particular data type in Swift, there is a function you can access a symbol something like __TMd<more mangling> which is the type Metadata for that type. The Metatype has a pointer to the nominal type descriptor, which will tell you the size of the type.

The Value Witness Table can also be accessed from a function associated with the type. It will be called something like __TWV<more mangling>. It’s not a function (no F after the T), it is a symbol associated with the table itself. Wait. Value Witness What? What the heck is it?

It is a set of 19 function pointers to methods that do a set of operations on Swift value types, such as “copy this Swift value type to this uninitialized location” or “copy this Swift value type to this previously initialized location” or “take this Swift value type from this location (dropping reference counts) and copy it to this uninitialized location”. And in here are the pieces that you need to effect a swift_retain or swift_release on a value type. When Swift calls swift with a value type, it uses these methods (usually inlined) to do a copy-to-uninitialized for passing the value type.  The callee will do the similar call to do a take operation, which will drop the reference count.

So if you want to pass a Swift value type to a Swift method, you will need to make a scratch copy using one of the copy-to-uninitialized flavors in the Value Witness Table of the type.

Great! Fantastic – all set, ready to go!

Not so fast. Things get interesting if you’re dealing with generic types.

See, the argument passing rules change quite a bit with generic types.

If you’re calling a function with generic arguments in Swift, all generic types are passed by reference, even value types. This makes sense because if I have the following function in Swift:


public struct Foo<T> {
    public var x:T
    public init(a:T) {
        x = a
public func printX<T>(value:Foo<T>) {

How big is value in printX? Does is fit in 3 machine pointers or fewer? We don’t know. Sorry, sucks to be you.
The question is, how does Swift know how to do this? You would think, “I know, I just need to get the Metatype and the Value Witness table and then I can…” Not so fast. In generic types, these symbols don’t exist. To get the Metatype, you need to call a function to get it. This function is named something like __TFMa<more mangling> which is the Metatype accessor function, but you can’t simply call it as a function of no arguments. And here, once again, Swift gets, well interesting. Usually, Swift functions are explicit in the name mangling as to how many arguments they take and the types. Any function that takes generic arguments may take extra implicit arguments tacked onto the end. The Metatype accessor is one of those. If a Swift type takes 0 generic arguments, the Metatype accessor takes 0 extra arguments. If the Swift type has generic parameters, then the Metatype accessor takes 1 argument for every generic parameter on that type and that argument is…ta-da, the Metatype for the generic specialization. So if I’m calling the Metatype accessor for Foo, I need to pass in the Metatype for Int. This will return a (possibly) dynamically constructed Metatype for Foo. Great – so there must be a similar method in there to get the Value Witness Table too, right? No. Sorry. No soup for you.
If you want to get the Value Witness Table of a generic type, you first need to get the Metatype. Then you need to look one machine pointer behind that to get a pointer to the Value Witness Table. In fact, if you’re accessing a non-generic type’s Metadata through the __TMd… symbol, a pointer to the Value Witness Table is stored a machine pointer behind it as well. Nifty.
Did I mention special cases earlier? I think I did. Let’s have some more.
Suppose I want to call printX() as outlined above. I’m sure you’ve figured out that I can’t just call it because printX needs the Metatype of T in order to get the Value Witness Table in order to drop a reference count on value on the value out, if needed. So as it turns out, given any generic Swift function, you need to pass in the metatypes of the generic parameters, one for each unique parameter after all the regular parameters. But that’s OK, since we now know how to do that.
Did I mention special cases? Yeah. Suppose that I have this:

public func myFunc<T>(a:someClass<T>, b: T) {

How many extra arguments are there? 1? Yes? No. 0. See, when you have a reference type specialized on a generic parameter, the Metatype of that generic parameter is embedded within the instance itself and the Swift compiler knows how to get at it, so it’s not needed as a parameter to myFunc.
So the rule is this: for every generic parameter in a function, f, you need to pass in the Metatype for the specialization of the type appended after all the arguments, unless that generic parameter is specialized in a reference type parameter.

So if you’re thinking to yourself, hey – there’s this cool Swift library that I’d like to call from my code – it’s just a function, right? Think again. First, the Swift ABI is not compatible with standard platform ABIs. Second, some of the other calling conventions get interesting because of reference counting. Third and finally, generics will make your life very interesting. So why did you want to do this again?

I’m Old, Part XXXIX: The One True Brace Style

There are few things that can rouse the ire of a programming than inconsistent or unusual brace style. At the same time, there are very few things that matter less in your code.

I will admit to favoring something close to K&R. If you’re curious, Wikipedia has a complete list of styles including the correct style and a bunch of abhorrent variations. Ha-ha! I kid! All of the brace styles in there are terrible. To somebody.

Before Adobe, I mostly worked solo on projects and the code style was pretty uniform across the project as a result. Working on Acrobat – that was a whole different matter. The Mac-only code-base was written by three or four engineers and included a fair amount of cross platform code that was written by several other people. All of us had varied backgrounds. Of course I followed K&R, I learned C at Bell Labs.

Still, each of us carved out our own special chunks of functionality and morphed them into our own styles. Where things got funny was when we had to interact over code. Pair programming, really. This usually happened when someone was stuck on a problem. This happens all the time and usually the answer is right in front of you, but you need another set of eyes to see the missing semicolon or uninitialized local or whatever.

Alan Wootton, who was the anchor of the Mac team, was particularly good at spotting problems. However, there was a price to getting his help. Alan had a process of working through other people’s code, and part of that process was to manually transform the brace style in front of him to his own, which was closer to Allman, which is more like Pascal style. Not surprising really, Alan did a lot of Pascal programming.

But watching it. It was like watching someone go through an art gallery of your own work and nudge all the paintings out of square. Brrr.

But he’d inevitably find the issue, fix it for you and check it in, making all his brace changes permanent.

Here’s the thing though -and I know it’s heretical- it doesn’t matter. Honestly. Brace style doesn’t matter. Getting wound up over brace style is akin to getting worked up over pop/soda/coke or sub/grinder/hoagie. What matter is the code in a bigger picture. How is it structured? Does it make sense and strike the right balance of abstraction? If you’re having trouble reading the code, that’s a smell. Go ahead and attend to it, but be prepared to embrace any style.

Partial Calculations in F#

I’ve been playing around with a little project in F# and got stuck on a little problem: how can I use the partial function application to give do a small amount of work and return a function that I can call later to do more?

To begin, let’s start with the canonical example of factorial:

let fact n =
    let rec factrec acc curr =
        if curr = 0 then acc
        else factrec (acc * curr) (curr - 1)
    factrec 1 n

This is a tail-recursive implementation of factorial that uses an inner function for the actual calculation.

The problem with this style is that while it writes well, it doesn’t take into account things that your typical user might want, for example, I/O cover-up. In other words, in a lengthy calculation, you might want a spinning cursor or a progress bar or any number of other things to make the user aware that the computer is working and hasn’t hung.

As a side note, there was I/O cover-up built-in to Adobe Illustrator on the Mac, which IIRC instead of having something that took a long time spinning the beach ball cursor once in a while, they installed a VBL interrupt task to spin the beach ball for them so they didn’t have to modify the code at all. This spun the cursor very smoothly, but was anathema to the purpose of the feedback because the main program could very well be hung and the cursor would continue to spin. Oops.

So let’s look at how we can rewrite this code to only do a little work at a time and be able to pick up where we left off. There are two main cases that we care about: a result or more work to do. That feels like a discriminated union to me.

type Partial<'a> =
    | Result of 'a
    | Continuation of (unit -> Partial<'a>)

The important case here is the continuation, which encapsulates a function of no arguments that returns a Partial.
So given the partial type, we can write a function that executes the partial to completion.

let rec performUntilDone (cont:Partial<'a>) =
    match cont with
    | Partial.Result (r) -> r
    | Partial.Continuation(f) ->
        printfn "Continuing..."
        performUntilDone (f ())

From here, we can rewrite our factorial function to use the Partial type:

let factpartial n =
    let rec factrec acc curr () =
        if curr = 0 then Partial.Result(acc)
        else Partial.Continuation(factrec (acc * curr) (curr - 1))
    Partial.Continuation(factrec 1 n)

The differences are few, but notable. First, the inner function has a new argument, which is the type unit. This is because we’re going to do a partial application on that function, but we need to bind to all the arguments and that’s not partial application. For the base case of curr = 0, we return a Partial.Result(acc) instead of acc and instead of the recursive call, we return a Partial.Continuation of the partial application of the inner function.
When we run this with 5, we get this at output:


What if instead we wanted to have more information about the calculation that’s going on? For that we need to redefine the Partial type to also encapsulate intermediate results as well as the performUntilDone function and make a change to factorial to use the PartialIntermediate type.

type PartialIntermediate<'a, 'b> =
    | Result of 'a
    | Continuation of 'b * (unit -> PartialIntermediate<'a, 'b>)
let rec performIntermediateUntilDone (cont:PartialIntermediate<'a, 'b>) =
    match cont with
    | PartialIntermediate.Result(r) -> r
    | PartialIntermediate.Continuation(intermediate, f) ->
        printfn "Continuing (%A)" intermediate
        performIntermediateUntilDone (f ())
let factintermediate n =
    let rec factrec (acc:int) curr () =
        if curr = 0 then PartialIntermediate.Result(acc)
        else PartialIntermediate.Continuation((acc, (factrec (acc * curr) (curr - 1))))
    PartialIntermediate.Continuation((1, factrec 1 n))

When you run this, you get this output:

Continuing (1)
Continuing (1)
Continuing (5)
Continuing (20)
Continuing (60)
Continuing (120)

We can see that it’s very easy to transform traditional tail-recursive code and transform it into code that can run incrementally. Certainly, there are other ways that this code could be formatted. For example, you could rewrite the function to take a callback function and run it in a thread. Both approaches have their benefits and detriments.