I’m Old, Part XLIV: OCR and Support

At my last job, one of the many jobs I did was create .NET bindings to OCR engines. Optical Character Recognition is a tricky technology that is rife with hacks and tricks to try to get the recognition rate better on crappy quality scanned documents. In that technology space, there’s around 10 or so decent quality engines and I’ve worked with most of them in one way or another.

Before I go on, I want to talk about OCR companies. All of the ones that I worked with are batshiat insane to some degree. I don’t say this lightly. All of them had unusual licensing terms and/or unique run-time licensing. Most of them offered a way to build their tool into a toolkit and also had an end-user application that they sold. One company in particular, wrote a contract that said–and I swear, I’m not making this up–that you couldn’t use their toolkit to build software to recognize text and create documents. See what I mean? We hired an engineer, Justin, who early on was consistently appalled at this. Eventually, he got jaded enough that when one of these issues came up, if I started the sentence, “because all OCR manufacturers are…” he would quickly finish it “…batshiat insane.” while rolling his eyes.

When I was initially hired, as part of the interview I was given a task to design an OCR interface. I put on my architect hat and designed a nice little class hierarchy that created a decent, extensible, engine-neutral interface. When I was hired, my first task was to learn C#, my second was to implement the OCR interfacing for real. I was given an engine to work with that was implemented as a C library. I built a C# set of tools that hid the specifics and sharp edges of the C library and presented instead a friendly interface that was easy to get started with and had room to grow. For example, the initial toolkit had the ability to translate a scanned document into a few basic document types, including PDF. I exposed those tools as if they were separate objects. Eventually, we added our own PDF output tool that was far better than that engine’s and it stitched into the workflow without deep changes to our users. The toolkit was neutral enough that we were able to get 7 different OCR engines to work with the same front-facing interface.

The main problem with working with OCR engines is initializing their code, managing licensing and preparing them to run. Every single engine had unique problems. Every. Single. One. Explaining this to our poor users was an uphill battle that our support engineers dealt with. We wrote sample code, documentation, and tech notes all of which were routinely ignored.

One particular engine had truly inspired licensing and had odd requirements in terms of having certain directories available to it to find dlls and resource files. All of these things had to be done well in front of when you even touched the engine class or it would fail miserably. We documented this and set up examples that said “you must do this or you will see this error.” Many customers got this right.

Then there was this one customer. He called into support angry. Angry because the engine was expensive and it wasn’t working. Our engineer worked with him and explained what he needed to do (i.e. read the technote to him). He ignored the engineer, didn’t have luck and called back in and escalated to an engineer. He was sent to my peer, Lou, who is very patient and told him pretty much the same thing the support engineer told him, which he again ignored. He called back and wanted to escalate to me.

Now, we didn’t have a big office, so I knew exactly what was going on: angry customer who wouldn’t listen. Got it. Been there, done that, bought the t-shirt.

I got onto a remote desktop session with him and got on a speakerphone call. I had him show me his code to see what he was doing. I dictated breakpoints and asked him to show me the contents of particular variables. Great. I was pretty sure what was going on, but to be 100% sure I needed one more thi-he started pulling up a web browser of his code and started to search the web. I said very directly something along the lines of, “do you want me to help you or are you going to surf the web?” He minimized the the web browser and I looked at his code – yup – he had done nothing that any of the other engineers had suggested. I told him to give me control of the keyboard and mouse and I put into his code the magic that the technote suggested, ran it, and saw correct results.

He ended up sending Elaine, one of our support engineers, a bouquet of flowers and called me a prima donna. And to this day, I still believe he is without a clue.

2 thoughts on “I’m Old, Part XLIV: OCR and Support”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.