Writing a PNG Decoder in F#, Part 3

In Part I, I showed how to use F# records to model PNG chunks. In Part II, I showed how to calculate a CRC code and did some medium-weight bit-banging. Today, I’m going to show the code to read chunks and optionally validate the CRC. Then we’ll sew up some code to read all the chunks at once and aggregate them.

Recall that this is the structure of a chunk:

|length|tag|data|crc|

 
First, we read the length and the tag, then we’re going to mark the position of the data by saving the stream position. Then we either skip past the data, or calculate it’s CRC:

let readChunk stm validateCrc =
    let length = readInt stm |> uint32
    let tag = readFourCC stm
    let position = stm.Position
 
    let crcCalc = if validateCrc then streamSeq stm (position - 4L) (int length + 4) |> crc32
                  else
                        stm.Seek(position + (int64 length), SeekOrigin.Begin) |> ignore
                        (uint32 0)
    let crcRead = readInt stm |> uint32
    if validateCrc && crcCalc <> crcRead then raise (Exception("Invalid crc in chunk " + tag.ToString()))
    makeChunk tag length crcRead position

For the calculated CRC, I make a stream-based byte sequence based on where the data is and run that straight into the crc32 function, which conveniently enough takes a sequence. Finally, if the CRC passes, we call makeChunk on the based on everything we’ve read. Why make the CRC calculation an option? Because, quite honestly, they fail so very rarely, it just isn’t worth the time. It’s way faster to seek past it.

Wait, what was that about a stream sequence?

OK – I love F# sequences. They’re lazy generators of data, not unlike an I/O stream. So let’s see how that works:

let streamSeq (stm:Stream) position len =
    let rec helpRead len =
        seq {
            if len > 0 then
                let b = stm.ReadByte()
                if (b >= 0) then
                    yield (byte b)
                    yield! helpRead (len - 1)
        }
    stm.Seek(position, SeekOrigin.Begin) |> ignore
    helpRead len

In this case, helpRead is a tail-recursive helper function that reads a byte, and if it’s not at end of, file it returns it and recurses. The main routine seeks to the data start and returns calls the helper. Nice.

What’s left is to build a function to read all the chunks, or more precisely, a function that can turn a stream into a sequence of chunks:

let rec chunkSeq (stm:Stream) validateCrc =
    seq {
        if stm.Position < stm.Length then
            yield readChunk stm validateCrc
            yield! chunkSeq stm validateCrc
    }

This is also very nice. Basically, as long as we have data to read, we yield up a chunk and then recurse.

When it comes to aggregating the chunks, it gets a little sloppier, mostly because I chose to use .NET collections List and Dictionary instead of the F# types list and map. The problem I have is that I need to append to the lists, which is pessimal behavior in F# and I feel that the near constant lookup of the hash table based Dictionary will be much better than that of the AVL based map. And like most things, F# discourages you from using .NET collections by making the resulting code uglier.

let readAllChunks stm validateCrc =
    let inline chunkAcc (d:Dictionary<FourCC, List<Chunk>>) (chunk:Chunk) =
        let chunklist = match d.TryGetValue(chunk.name) with
                        | (true, l) -> l
                        | (_, _) ->
                            let l = List<Chunk>()
                            d.Add(chunk.name, l)
                            l
        chunklist.Add(chunk)
        d
    chunkSeq stm validateCrc |> Seq.fold chunkAcc (Dictionary<FourCC, List<Chunk>>())

What I do is use the inline function chunkAcc to accumulate a new chunk into a Dictionary of FourCC mapped onto a List of Chunk. The mapping works like this – if, the chunk’s name (a FourCC) is in the dictionary, we return the list to which it maps, if not we make a new list and put the tag and list into the dictionary. Once that’s done, we append the chunk into the list and return the dictionary that was passed in. All this gets wrapped up by running a chunk sequence though a fold into an empty dictionary. When all is said and done, we get a Dictionary of all the chunks in the file.

My test code of this took a directory of PNG files and pumped them all through realAllChunks with and without CRC checks and they all passed.

This code is fine, but it is not particularly resilient, nor does it guarantee that a given PNG file is correct. What we would like is something that, even in the event of a damaged file, may still decode an image. The PNG spec specifies that certain chunks must appear in a file in a particular order, but they could be followed by non-required chunks which are damaged or truncated. Although the file is damaged, the image should still be readable. Next time, we’ll cover how to handle broken files as well as how to validate what we’ve read.

One thought on “Writing a PNG Decoder in F#, Part 3”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.