Paperless Law Office: Implementing OCR

How to Achieve a Paperless Law Office – Part 2:

Implementing OCR

Recently we discussed How to Achieve a Paperless Law Office. We explored the process necessary to achieve paperless bliss, and the tools necessary to support that process.

In this supplement we’ll cover a critical component to achieving a paperless law office: OCR.

What is OCR, and Why Should I Care?

At Uptime Legal we’ve helped many law firms go paperless. A key, early consideration for this process is making sure that OCR is part of the process. Why–and what is OCR?

Any time a document starts as a paper document, it must be scanned to become electronic. And by default, a scanned document is essentially an image. You’ve probably seen this before: you use a scanner to scan a hardcopy document, and the resulting electronic file is an image, almost like a photograph of the documents. (In fact, that’s exactly what it is.)

This means that the text in the document you’ve scanned isn’t actually text at all–rather your scanned document is a photograph of text. This means you, as the user cannot select, copy or paste any text from the document.

And more importantly: you can’t search for text in a photograph. That’s right–these raw, scanned documents will never show up in a search. If that scanned document has a key piece of language or text that you’ll need later–you’ll never find it in a search in its raw, un-OCR’d format.

OCR (which stands for Optical Character Recognition) is the process of converting the image of a scanned document into actual text–that can be selected, copied, pasted and–most importantly: indexed and searched. OCR software processes these raw, “photographs of documents” files, interprets what it discerns as characters (letters and numbers) and converts them to actual, searchable text.

More Techie Info:

PDF files inherently have two layers: an image layer and a (hidden) text layer. A freshly scanned document, before its OCR’d has only an image layer: what you see when you open the PDF file. OCR software will determine what characters are actually in the scanned image layer and will populate the text layer with the document’s actual text. (Pretty cool, right?)

Example of OCR

Let’s look at an example to flesh out our description of what OCR is. We’ll look at a document that has not been OCR’d, and then the same document after its been processed by OCR software.

Here’s a document fresh from the scanner. You can see the text in it, but it’s not been OCR’d. Notice that when I try to select any text in the document (maybe to copy/paste), I can’t: it’s just an image

Now, suppose I OCR this document. (We’ll talk about what software will perform OCR shortly). For this example we’ll say I’ve stored this document to my firm’s Document Management System, which OCR’s the document after its been uploaded.

Here that same document is, now OCR’d. Notice that the text in the document is now selectable; and more importantly: Will be indexed by my Document Management System and will show up in a search.

Think about how many documents your firm has collected over the years. Next think about how many of those PDF documents came from scanners–many of which likely aren’t OCR’d, and therefor will never show up in a search. For many law firms this means thousands of documents will never be found in a search.

Beyond PDF Files

The importance of OCR goes beyond just PDF files. Most scanners will, by default, create PDF’s from scanned documents. But inevitably some scanned documents (including documents that outside parties email you from their scanner) will be in other formats; image formats such as PNG, TIF, JPG and others.

OCR software (at least good OCR software) will also convert these true image files to OCR’d, text-enabled PDF files. Good Document Management software will do this automatically. More on this shortly.

How to Implement OCR

Okay, now you’re convinced that simply going paperless isn’t good enough: To achieve a meaningful paperless law office you must do it part-and-parcel with OCR software. (You are convinced, right?) So how do you implement OCR software? There are a few ways–we’ll cover each and wrap up with which is the most effective for law firms.

1. Scanners’ Built-In OCR. Some scanners come with their own proprietary OCR software. The benefit of using this method is that the OCR process will happen immediately when you scan a document. The drawback is that this method will only OCR documents that you scan, not documents that are sent to you scanned by others. It only solves half of the OCR dilemma.

2. Stand-alone OCR Software. You could also implement stand-alone, third-party OCR software. You could purchase and install OCR software on every person in your firm’s desktop and instruct them to always be sure to OCR every document. Even Adobe Acrobat has the ability to OCR PDF files. The problem here is it relies on every one of your firm’s employees to remember to OCR scanned documents every time, without fail. This method relies too heavily on the discretion of every human in your firm. And–it means you’d have to install and maintain OCR software on every single computer your firm uses.

3. Integrated, Automatic OCR. Finally we arrive at automatic OCR. In this method, the Document Management System that you use to store, organize and manage your matter documents does the OCR for you, automatically. The best way to achieve a paperless law office is to implement OCR in a way that ensures every scanned document: PDF, image or otherwise, is OCR’d, every time, without any user intervention, and without having to install extra software.

This way, how the document got to you doesn’t matter. Documents that you receive in hardcopy and scan, or receive by email or receive in discovery: Once its uploaded to your Document Management System its automatically OCR’d, indexed and searchable.

Not all Document Management software has OCR software built-in. Some integrate with third-party OCR software–which is better than nothing–but still requires you to setup and manage another system. In our experience helping hundreds of law firms move to the cloud and go paperless, we’ve learned a few important lessons. Among them: Its best to implement a Document Management system that includes automatic, integrated OCR.

Closing the Loop

There you have it. We hope this article is helpful in your paperless law office endeavors. We hope you not only understand the importance of implementing OCR into your law practice, but also have a solid direction in finding the right method to implement OCR into your firm.

Good luck.

About the Author: Dennis Dimka
Dennis Dimka is the CEO and founder of Uptime Legal Systems, North America's leading provider of technology, cloud and marketing services to law firms. Dennis is the author of Law Practice as a Service: How and Why to Move Your Law Firm to the Cloud. Follow Dennis on LinkedIn.

Leave a Reply