I’m Old, Part LXXVI: Trying Crazy Things

When working at Atalasoft, we did most of our work in managed languages: C#, F# or Javascript. We often had 3rd party libraries that were written in C or C++ that were exposed as C# which was part of our value add: binding C to a decent C# API is not always easy and the most obvious solution isn’t always the best one.

One chunk of technology we built in house was a library with tools to consume and generate PDF files. We were our own customer in that many of our separate PDF tools were written to our own API. For example, we had tools to generate PDF documents from images, render the output from OCR engines, and to generate/manipulate annotations that were all written to our own API.

To be clear though, the API we used was a private API. There was a reason for that: it cleaved very close to the PDF standard and it had very few safeguards from your own stupidity in terms of creating spec violating documents and felt that it was not suitable for our customers. We learned from doing that wrong with the TIFF spec. Exposing the dangerous API helps very few but creates a support nightmare.

When we released a friendly API on top of the internal PDF library we tried to make it near impossible to generate bad PDF.

At this point, I tried doing the things that were on the border of unreasonable if not fully within crazy-pants territory, just to see how the code would do.

I grabbed the text of Moby Dick from Project Gutenberg and wrote a C# app to render it. This is nearly 600 pages. I included special casing for chapter headers with chapter numbers and drop caps, page numbering and so on. It rendered in a few seconds with no special considerations about memory or buffering. Most PDF print drivers can’t do it that quickly.

I decided to put annotations through the ringer. I wrote code that took a sample image that we had and resample it and render blocks of 8×8 pixels as colored rectangle annotations onto a PDF page.

The tools put down 1400+ annotations and saved the document in under 1.5 seconds. It took Acrobat more than a 5 minutes just to open the file and render it. My code could open it in slightly longer than the time to create it. I should point out that originally, I was rendering the image in annotations at a substantially higher resolution – more that 4K of annotations and that just plain hung Acrobat.

For fun, I decided to do an unthinkable pet project. I wrote a PDF sketch app in F#. For the design, I adopted a fairly traditional Model-View-Controller and I tried to keep things strict: the model and view model were totally distinct. The model was in F# – a nice discriminated union to describe various shapes. The view model was my PDF toolkit being treated as write only. The view was our PDF rasterizer (which was written by FoxIt and was in C).

So what happened was that when the user drew a shape on the page, the F# code would render the view as PDF also render UI artifacts (shape handles, bezier control points, etc) as PDF on top of that, write it to a stream and send it to FoxIt to render to an image which I then stamped blindly onto the display window.

All this happened at UI speed. The app was creating, rasterizing, and then throwing away 40 PDF documents per second. I did a company demo of this to help drive home the performance of our tools so that sales staff could internalize the selling point. I was also ready for the next question – why don’t we ship this demo? I started putting lots of shapes on the page and the app got visibly slower. At around 20 shapes it entered the unusable territory, which I knew would happen all along: FoxIt couldn’t keep up. It was still quick, but as the file grew in complexity, so did the rendering, which is why most real drawing apps have a special case renderer for this kind of work and do things like render the parts that don’t change, cache it while rendering the changing parts separately, then compositing the two into the display.

What’s the point?

No. It’s not the Jurassic Park lesson “Your scientists were do preoccupied with whether they could they didn’t stop to think if they should.” The lesson is parallel, though. Go ahead and figure out the boundary of where the “could we” line is. The “should we” line will be obvious afterwards. In other words: know your strengths and know your limits.

None of the crazy samples made it into our documentation or into our usual stable of sample code that we shipped with the toolkit, but it was nice to know that if our customers tried something crazy (and trust me, they did), we were there to back them up.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.