I’m Old, Part XXXI: Geese. Oh the Geese.

When I worked at Bell Communications Research, I was in the Morristown office, which had two buildings on the site, some green space and a couple small ponds.

bellcoresite

I worked in the building with the three wings. The building had 3 floors and each wing had 3 corridors that ran from the center to the end (most of the time – sometimes labs blocked the center corridor). Each wing also had cross corridors.

The really entertaining thing about this layout was that instead of, say, putting letters on the corridors and numbers on the offices, they let a mathematician choose the numbering scheme and he chose polar coordinates. He divided the layout into wedges and then numbers went from low to high from center out. Which is a fantastic way to do it, if offices weren’t so chunky. The end result was that you could end up with an office with a D letter right next to an office with an M letter. Finding anything close to the hub was a pain in the ass.

When I started, each floor was a different color: red, green and blue, but over time they repainted and shifted the colors so the green was more a sea foam and the blue was closer to indigo.

At the time, the buildings had no direct communications between them. There were tunnels that connected them, but they wouldn’t run fiber or copper. If you wanted to move data from one building to the other, you used a 1200 baud connection. In a brilliant move of ad-hoc infrastructure, two offices that faced each other installed a laser-based communications system to pipe data between the buildings. This was all well and good except for the geese.

The laser connection was fairly fault-tolerant and could cope with a songbird blocking the beam, but it couldn’t handle the blackout time caused by the geese.

Since the site had two ponds and had some green area, it was an ideal location for Canada geese to nest. There were geese at that site all the effing time. Which meant that all the green space was covered with feces. And the parking lot. And the side walks.

For the most part, the geese were benign. If they were in my way, I would spread my arms out which is a dominance display. A goose that sees that backs away from the sight of someone with huge a huge wingspan.

Except when they’re nesting. Then the ganders not only don’t back down, but get aggressive. Security would put out traffic cones around the nesting sites to keep people from parking there, but the geese don’t care too much about that. At one point, I was biking home and on the way out a gander clipped me on my helmet. I stopped immediately and grabbed my bike pump off the frame ready to take the gander out if I had to. It retreated.

Buildings and grounds tried a couple things to reduce the goose population. The first thing they did was put out foam swans. In theory geese and swans don’t like each other and geese won’t land where there are swans. Do you know the difference between theory and practice? In theory they’re the same, but in practice they’re different. The geese ignored the foam swans. One of the swans ended up decapitated and drifted around that way for months.

Then they tried stringing a lattice of wire over the ponds thinking that if they made the ponds inaccessible, the geese would find somewhere else. As it turns out, geese can land perfectly well on land and walk into the water and duck under the wires.

No luck.

There was no end to the geese.

A Little C# Extension Gem

Over the years, I’ve written this loop in countless languages, countless times:

bool first = true;
foreach (var value in collection) {
    if (!first)
       WriteToOutput(", ");
    first = false;
    WriteToOutput(value.ToString());
}

Or something similar. It’s easy enough to write, but for the sake of DRY (Don’t Repeat Yourself), I decided to finally put this one to bed by writing a C# extension method.

public static IEnumerable<T> Interleave<T>(this IEnumerable<T> contents,
            T separator, bool includeSeparatorFirst = false) {
    bool first = true;
    foreach (T t in contents) {
        if (!first || includeSeparatorFirst)
            yield return separator;
        first = false;
        yield return t;
    }
}

The previous chunk of code would change to:

foreach (string s in collection.Select(value => value.ToString()).Interleave(", "))
    WriteToOuput(s);

If you’ve ever written code to write code, this is a very good thing, but it could be even better. Let’s build an add-on that’s better tuned for code:

public static IEnumerable<T> BracketInterleave<T>(this IEnumerable<T> contents,
            T start, T end, T separator, bool includeSeparatorFirst = false) {
    yield return start;
    foreach (T t in contents.Interleave(separator, includeSeparatorFirst))
        yield return t;
    yield return end;
}

Suppose you have an array of bytes that you want to write out as a declaration. You can do that with the following:

public void WriteArrayDecl(string visibility, string name, byte[] data) {
    WriteToOuput($"{visibility} byte[] {name} =");
    foreach (string s in data.Select(b => String.Format("0x{X2)", b)).BracketInterleave("{\n    ", "}\n", ", "))
        WriteToOuputWithWrapping(s);
}

So we can see that with a little work, we can add an extension method that will make it easier to avoid repeating that old pattern when you need to do common tasks, like writing CSV files, writing code output, or any other similar task.

The Swift Type System is Not Your Friend (yet)

In OOP systems (and let’s be frank, Swift is an OOP system with functional aspirations), there are two general categories of languages. Those in which all objects descend from a single class and those which have a more disparate set of types. In .NET, Object is at the root and has a minimal set of methods for common behaviors (8 methods, to be precise). In general, there are two varieties of types in .NET: objects and value types, but all value types can be boxed into objects and can then be treated as objects.

This is not the case in Swift. Swift has the following varieties of types:

  • Value types (scalars, structs, enums, optionals, exceptions)
  • Reference types (objects)
  • Protocol lists
  • Tuples
  • Closures
  • Metatypes

Unfortunately, these types don’t always share things in common. Commonality and uniformity are a nice thing across members of a type system, which is why I prefer the .NET/JVM monolithic type system.

Today, in particular, I’m going to pick on the Swift Dictionary type. A dictionary is a mapping of keys to values using a mapping function of some kind, usually a hash function. In .NET/JVM, this is easy because every object implements Equals() and GetHashCode()/HashCode(), which is all you need to implement a hash table with collision resolution. In Swift, a dictionary is not simply a generic collection Dictionary<TKey, TValue>. It’s a little more complicated. TKey must conform to the protocols Hashable and Equatable (or having a global operator ==).

And that’s all well and good, but that means that you can only put things into a dictionary that implement those protocols.

Now consider that you have a system that must respond to arbitrary types and you want to build a dictionary of types mapped onto response functions. But you can’t do this. At least not directly. This is because metatypes aren’t objects and don’t adopt protocols. You’re not fully SOL because you can substitute ObjectIdentifier(metatype), but that means an extra object/value. Guh. What else can’t go into dictionaries? Tuples. Tuples can’t go into dictionaries. Probably because tuples can’t guarantee that all their elements implement Hashable and Equatable.

The problem, honestly, is entirely in the hands of the implementation of Dictionary. The dictionary implementation should really look something like this:

 

public class Dictionary<T, U> {
    public Dictionary(capacity:int, hasher: T->Int, comparer: (T,T)->Bool) { /* ... */ }
}

Now mind you, the Swift thing to do is to use a struct type with copy-on-write semantics, but the important thing is the constructor, into which you pass in your own hash function and your own comparer. With this type of constructor, you can put anything into a dictionary that you want as long as you pass in right closures. On top of that, the compiler which gives you sweet syntactic sugar on top of dictionary will still work if your type implements hashable and equatable because it can use use closures that adapt onto Hashable and Equatable.

Still, the non-uniformity of Swift types is maddening. Depending on the module writers to “just do the right thing” is exactly the wrong approach. How can we expect 3rd party authors to do the right thing when Apple doesn’t? But perhaps Apple will fix the core types. Apple has been rather blithe about breaking compatibility with every release of Swift and they could fix Dictionary, Tuple and Metatype if they wanted to.

 

I’m Old, Part XXX: Make It Go Faster

When I started doing serious coding for the Macintosh at Bell Communications Research, I was originally writing code on a Lisa running MacOS. The compiler we had was Aztec C by Jim Goodnow II. It was slow to load and ran it’s own weak shell, based on a UNIX shell, but without most of the goodies. It was there mostly to run their own version of make or for your to call the compiler and linker on your own.

screen_shot_2015-01-06_at_12-39-51_pm

It ran like treacle in January. As my project grew, I would type in some bug fixes in a point-and click editor, save them, start up the shell, start a build, then take a walk. After coming back, I copied the executable onto a floppy then tried it on a Mac 128, debug and start again. The speed was understandable. The original Mac wasn’t all that fast and the compiler was also based on the tradition UNIX tool chain. It ran the c pre-processor, then the compiler, then the assembler and then finally the linker, which was not particularly well-suited to the monolithic Macintosh environment.

Later, when my Mac was upgraded to a “Fat Mac” and had an outboard ProFile (aka “SlowFile”) hard drive, it had enough oomph for me to develop natively.

A couple years later, I got a copy of Lightspeed C, a new entry in the Mac C compiler market. It was shy on documentation, but it ran like a bat out of hell compared with Aztec C. They were so proud of the performance that during compilation, they put up a dialog box with a line count. It was very impressive.

Later at Adobe, working on Acrobat, we had many thousands of lines to compile on a full rebuild and we had tricks to make that go even faster with Lightspeed C (now Think C). The first trick was a separate high-speed hard drive for the source. The second was a buttload of RAM, a big disk cache and a RAM disk. We put the system headers on the RAM disk and you could tell when the compiler hit system headers because there was no access on the hard drive the line counter went way faster.

Oddly enough, we noted that if the screen saver kicked on during a compile, you could still see the line counter ticking by, which is odd because the point of the screen saver is to prevent exactly this kind of thing from happening.

We found out that the line counter in the compiler had been slowing down the compiler. Let that sink in: drawing numbers to the screen was taking up too much time. What they found was that Apple’s font rendering in Quickdraw was the bottleneck, so they wrote code to hard-write pictures of numbers to video RAM. This is a no-no and would never play nicely with screen savers.

All in the name of speed.

Writing a PNG Decoder in F#, Part 3

In Part I, I showed how to use F# records to model PNG chunks. In Part II, I showed how to calculate a CRC code and did some medium-weight bit-banging. Today, I’m going to show the code to read chunks and optionally validate the CRC. Then we’ll sew up some code to read all the chunks at once and aggregate them.

Recall that this is the structure of a chunk:

|length|tag|data|crc|

 
First, we read the length and the tag, then we’re going to mark the position of the data by saving the stream position. Then we either skip past the data, or calculate it’s CRC:

let readChunk stm validateCrc =
    let length = readInt stm |> uint32
    let tag = readFourCC stm
    let position = stm.Position
 
    let crcCalc = if validateCrc then streamSeq stm (position - 4L) (int length + 4) |> crc32
                  else
                        stm.Seek(position + (int64 length), SeekOrigin.Begin) |> ignore
                        (uint32 0)
    let crcRead = readInt stm |> uint32
    if validateCrc && crcCalc <> crcRead then raise (Exception("Invalid crc in chunk " + tag.ToString()))
    makeChunk tag length crcRead position

For the calculated CRC, I make a stream-based byte sequence based on where the data is and run that straight into the crc32 function, which conveniently enough takes a sequence. Finally, if the CRC passes, we call makeChunk on the based on everything we’ve read. Why make the CRC calculation an option? Because, quite honestly, they fail so very rarely, it just isn’t worth the time. It’s way faster to seek past it.

Wait, what was that about a stream sequence?

OK – I love F# sequences. They’re lazy generators of data, not unlike an I/O stream. So let’s see how that works:

let streamSeq (stm:Stream) position len =
    let rec helpRead len =
        seq {
            if len > 0 then
                let b = stm.ReadByte()
                if (b >= 0) then
                    yield (byte b)
                    yield! helpRead (len - 1)
        }
    stm.Seek(position, SeekOrigin.Begin) |> ignore
    helpRead len

In this case, helpRead is a tail-recursive helper function that reads a byte, and if it’s not at end of, file it returns it and recurses. The main routine seeks to the data start and returns calls the helper. Nice.

What’s left is to build a function to read all the chunks, or more precisely, a function that can turn a stream into a sequence of chunks:

let rec chunkSeq (stm:Stream) validateCrc =
    seq {
        if stm.Position < stm.Length then
            yield readChunk stm validateCrc
            yield! chunkSeq stm validateCrc
    }

This is also very nice. Basically, as long as we have data to read, we yield up a chunk and then recurse.

When it comes to aggregating the chunks, it gets a little sloppier, mostly because I chose to use .NET collections List and Dictionary instead of the F# types list and map. The problem I have is that I need to append to the lists, which is pessimal behavior in F# and I feel that the near constant lookup of the hash table based Dictionary will be much better than that of the AVL based map. And like most things, F# discourages you from using .NET collections by making the resulting code uglier.

let readAllChunks stm validateCrc =
    let inline chunkAcc (d:Dictionary<FourCC, List<Chunk>>) (chunk:Chunk) =
        let chunklist = match d.TryGetValue(chunk.name) with
                        | (true, l) -> l
                        | (_, _) ->
                            let l = List<Chunk>()
                            d.Add(chunk.name, l)
                            l
        chunklist.Add(chunk)
        d
    chunkSeq stm validateCrc |> Seq.fold chunkAcc (Dictionary<FourCC, List<Chunk>>())

What I do is use the inline function chunkAcc to accumulate a new chunk into a Dictionary of FourCC mapped onto a List of Chunk. The mapping works like this – if, the chunk’s name (a FourCC) is in the dictionary, we return the list to which it maps, if not we make a new list and put the tag and list into the dictionary. Once that’s done, we append the chunk into the list and return the dictionary that was passed in. All this gets wrapped up by running a chunk sequence though a fold into an empty dictionary. When all is said and done, we get a Dictionary of all the chunks in the file.

My test code of this took a directory of PNG files and pumped them all through realAllChunks with and without CRC checks and they all passed.

This code is fine, but it is not particularly resilient, nor does it guarantee that a given PNG file is correct. What we would like is something that, even in the event of a damaged file, may still decode an image. The PNG spec specifies that certain chunks must appear in a file in a particular order, but they could be followed by non-required chunks which are damaged or truncated. Although the file is damaged, the image should still be readable. Next time, we’ll cover how to handle broken files as well as how to validate what we’ve read.

I’m Old, Part XIX: You Got What For Doing That?

Let’s roll back the clock to 1992. When you were working on most systems, you had at least one CRT monitor. If you were super lucky, you had two, because screen real estate makes a big difference for writing code. I much prefer a dual head system over a single for developing and testing code just for being able to have documentation up on one screen and code on the other, or test rigs on one and source on the other.

At Adobe when I was in the printer division, I had single head machines. First a Sun workstation with a black and white screen and then later a DECStation 3100 with a color monitor.

When I moved to Acrobat, I was given a Macintosh IIcx and a black and white monitor. With some cajoling, I managed to get an outboard hard drive and a small color monitor.

The Macintosh had dull support in the OS for multiple displays, but not all applications dealt with it gracefully, because although the support was there to render no matter what, you usually want to do something different on a 1 bit display than on a color, and when you span several bit depths, it gets harder. I mentioned earlier that Joe Holt had put together a nifty chunk of code to make that much easier.

If you had a CRT, then chances are you ran a screen saver to keep the display from getting an image permanently burned into it. On the Mac, Berkeley Systems capitalized on this by writing a screen saver framework and people wrote all kinds of plug-in screen savers for it. Berkeley Systems put together special packages, some with themes, and sold those too. I ran the Simpsons Screen saver on my machine, usually the Itchy & Scratchy screen saver, which was truly inspired. They hooked into the Finder to get your desktop layout and Itchy uses all kinds of Finder items to kill Scratchy. And oddly enough, more screen savers honored multiple monitors well than typical desktop apps.

Berkeley Systems also ran contests where programmers would compete for some serious prizes. The nice thing about Adobe at the time was that it was chock full of creative people. For example, Harrison Page wrote a screen saver called “Botheration” which played sound samples picked at random from a collection at random intervals. For example, one such sample was Harrison saying, “Hey, Dan?” and other similar things to annoy the people who shared cubes around him. I’m surprised that Dan didn’t wound Harrison.

One day, I stopped by Ed Hall’s office and saw that he was running a screen saver on his Mac that looked every bit like an MS DOS machine. When the screen saver started, it imitated a DOS boot including a fake memory test that reflected how much memory was installed. Then it auto-typed DOS commands, wandering around your file system, listing directories, printing the date and time and so on.

I commented on it and Ed mentioned that he got paid $1,000/hour to write it. Wait, what? Ed saw the contest, and spent 10 hours writing the screen saver. He entered it in Berkeley’s contest and won $10,000. For 10 hours of “work”. Nice one, Ed.

I’m Old, Part XVIII: Office Politics

When I started at Adobe in 1990, the company was in a set of buildings off of Charleston Road in Mountain View. These buildings are now occupied by Google. They are your bog-standard California office buildings: offices and conference rooms around the perimeter, elevators and bathrooms in the center and the big open spaces were either labs or “cube land”.

cubicles

There was no space in the area dedicated to my department on the second floor, so I was put into a cube on the first, but the printer I was working on was in Kathie Riggle’s cube on the second floor, so I had to bounce around a lot and finding me was a trick. Eventually, I was temporarily moved into an office that I shared with Mike Hodgson, which worked out well since both of us were quiet, but there was a guy down the hall who was a whistler. It was a habit of his that he couldn’t break. He whistled when he was concentrating and it was high pitch and tuneless. Or, if it had a tune, he had devised a non-western scale because yikes. It drove me nuts.

Later, I was moved into a solo cube with people in my department, and it was probably the worst location you could have. It was just inside the door at the top of the stairs. Any conversations that people had as they came up the stairs spilled over into my cube, derailing my thoughts. Guh.

When I moved into the Acrobat division, I was in another building, but once again moved into a cube. It had a better location, but it was still a cube.

And honestly, cubes are awful. They’re not as bad as open floor plans, but they’re bad. The issue is that many programmers have 3 modes: distraction, low-intensity focus, and high-intensity focus. In the past, I’ve referred to the high-intensity focus as “code blur”. It’s feels like a folding of time, or at least the perception of it, as 3 or more hours vanish while I’m working on something. Code blur is a good thing because it represents a block of very high productivity. It is not free, though. It sometimes take sheer force of will to get into that depth of concentration and it is all too easy brought to an end by interruptions. Like a conversation outside your cube. You know how you can prevent that? By shutting a door. Headphones don’t cut it for me because every type of headphone I’ve tried makes my ears sore inside an hour.

I had heard rumors about a rubric for determining who got an office and who didn’t. It was based on the number of years of experience and the number of years at Adobe and the number of college degrees. No matter how you sliced it, I figured I was due after I had been on the Acrobat team for a couple years because there were people in my group who scored lower than me who were in offices. And there were two or three offices open. I asked and was denied, so I did the only thing an engineer could do. I got passive aggressive. There was a “joke” PostScript font that someone created called Ransom. I left my manager a couple notes. One day he came by to talk to me about something and I said, “you know, I’m starting to like my cube. I think I’ll go buy a pre-hung door and install it here so I can get the best of both worlds.”

I got an office sometime later. I’m not proud of my behavior in retrospect, but I was very happy with my boost in productivity.

Much later on, I read some research that was done on programmer productivity and office design and the research backed up my personal observations: programmers are more productive in offices with doors and least productive in loud open environments.