Scanned Paper (part IV)

Objective Coding:

“Who, What, When” will help you figure out “Where and How”

One definition of the word objective I found online is:

“uninfluenced by emotions or personal prejudices; presented factually”. 

This is accurate when it comes to the objective coding of documents in the not so objective world of litigation.  While you may believe that having OCR for .pdfs or associated text for .tiffs gives you the searching capabilities that you need, having documents objectively coded will really allow you to refine those searches and hone in on the specific documents you are looking for. 

OCR and associated text allows you to search for keywords through the entire text of the document, whereas objective coding allows you to choose specific fields where the name, word or date you are looking for exists.  You also are allowed to create a list of document types (i.e. Email, Financial Record, Police Report, Memo, etc.) specific to your case so that you can identify a specific subset of documents for review.  You can also combine information found in different fields to even further refine your search. 

The standard objective coding fields are:

  1. Author
  2. Recipient
  3. Copyee
  4. Date
  5. Title
  6. Document Type

________________________________________________________________________

For example:

Your search results for a document authored by “John Smith” on “January 1, 2010” would differ tremendously depending on whether you used OCR or objective coding to run your search. 

If you only used OCR for your search, you would find every single document that not only was authored by “John Smith” but in which his name appears.  You would also retrieve every document in which “January 1, 2010″ appears, even if it was simply mentioned in the body of the text.  This could result in an unwieldy subset of documents and not really help you identify the particular subset of documents you are looking for. 

However, if your documents were objectively coded, you could simply search in the Author field for “John Smith” and in the Date field for “January 1, 2010” and find any documents that specifically fit that criteria. 

________________________________________________________________________

If you don’t get objective coding with your discovery, ask for it.  Objective coding contains objective information – facts, not opinions or ideas about a document.  Opposing counsel would not be revealing any information about their case, about their case strategy or about the strengths and weaknesses of the discovery by sharing any objective coding they have done.  No privilege would be breached and no attorney work product would be turned over.  Rather, a win-win situation is created when the cost of capturing factual information that will equally help both parties organize and review the discovery is shared. 

Up next:

Part V: Running a Document Inventory

Scanned Paper (part III)

Unitization:

Where One Document Ends, Another Always Begins

When you receive a set of scanned documents as part of your discovery, you should be able to visualize how those documents were kept in the original custodian’s desk drawer.  You should be able to identify which documents were kept together within a file folder or binder and where one document ends and the next begins.  Being able to recognize the order and organization of your discovery means that the documents were properly unitized.  

Having properly unitized documents is key to being able to effectively review scanned discovery.  You can efficiently move from one document to the next, as well as get a sense of how the documents relate to each other.  It is almost impossible to only work with just loose pages, so you should always ask for discovery to be produced to you with its proper unitization. 

Keep in mind there are two types of unitization: 

1. Physical Breaks:

A document can simply be defined by its physical breaks.  This includes staples, paper clips, binders, folders, etc.  Unitization by physical breaks is usually done at the time of scanning, as the scanning operator is able to see where the breaks exist.  If you choose to unitize documents by their physical breaks, no relationships between documents are captured but it will be clear where one document ends and the next begins. 

2. Logical Document Determination (LDD):

What is logical about a stack of paper that has sat in somebody’s desk for years you might ask?  Whether we want to admit it or not, the way those documents were kept is often a major part of the story a litigation team is trying to tell.  A common way to describe documents that are related is to say they are part of a family of documents. 

for example:

If you know that a spreadsheet was clipped to a memo, even though the memo made no mention of any attached spreadsheet, you have learned a telling piece of information about the relationship between those documents.

 

If the documents are given to you with a load file, the load file will act as your roadmap.  Typically, a production of single page .tiffs that would reflect a huge stack of loose paper if printed are accompanied by a load file that lays out where the document breaks are.  If the documents have been logically unitized, the load file will also identify the parentchild attachments.

If the documents are not unitized when you receive them, you may want to contact the source and ask them for a untized set.  If the source does not have a unitized set, the best option is typically to contact a litigation vendor who is familiar with the process of unitization.  They usually have teams of people trained in using software specifically designed to create document breaks as well as identify document families.

Up next:

Part IV: Objective Coding – “Who, What, When” will help you figure out “Where and How”

Scanned Paper (part II)

Searchable Scanned Documents
Can you Really Find that Needle in the Haystack?

Searchable PDFs

PDFs are typically just images. Hard to believe, when we think about all the things we can do with a PDF, including using it to create forms that can be filled out electronically and having it act as a container to hold a variety of other document formats.  Scanned paper only becomes searchable if it has been OCRed.

What is OCR, you might ask?

OCR stands for Optical Character Recognition.  I know that’s not of much help but two of my goals is to provide you with interesting dinner conversation and obscure words to answer the latest Geek version of Trivial Pursuit.  The third, and most practical goal is to allow you to communicate amongst your team and with vendors about getting your PDFs into a searchable format.

The process of having a document OCRed is actually quite miraculous and the technical aspects can get quite complex.  What is most important to note is that OCRing a PDF allows the text on the PDF to be captured and layered underneath the image in such a way that the text on the image is now searchable.  However, there is always a chance that the OCR within a document can be of poor quality, making the searching of the text inaccurate.

Why might you get poor quality OCR, you ask? 

Most likely, it has to do with the quality of the document being scanned into PDF, but here are some additional reasons to consider:

  • handwriting; the original paper is of poor quality (i.e., photocopy of a photocopy of a fax)
  • scanned in grayscale instead of black and white
  • scanned at a low resolution
  • black and white documents scanned as color
  • foreign language
  • graphics or lines on the page
  • size of font and type of font on the original document
  • …there are always more…

 

TIFFs with Associated Text Files

TIFFs are a lot like pictures or static images.  Unlike PDFs, the text seen on a TIFF can only be captured in a separate file called an Associated Text File, and the image and the text can only be married together to make the TIFF searchable in applications we like to call Evidence Review Platforms (ERPs). 

TIFFs can come as single page documents or multi-page documents.  If they come as single page documents, the only way you would know where one document ends and the next begins is by looking at a load file that uses document ID numbers to identify those document breaks.  Again, load files are designed to be interpreted in an ERP alongside your set of single page TIFFs, and when all of those pieces of information are put together, you are able to move, within the ERP, from one document to the next as well as search through the captured text. 

However, just like the OCR of a PDF, Associated Text Files can vary in quality depending on the quality of the TIFF, the quality of the program used to create the Associated Text File and many other factors.  With all these moving parts, it is hard to really review TIFFs efficiently without some sort of specialized application – like an ERP.  However, once TIFFs are placed into an ERP, the load file and the Associated Text File fall right into place and allow you to perform text searches and to move fairly easily from one document to the next.

Up next:

Part III: Unitization – Where One Document Ends, Another Begins…

Scanned Paper (part I)

Does a Picture Really Speak a Thousand Words?
(part of an ongoing series about scanned paper)

Whether we like it or not, technology is not only becoming part of the legal world, but oftentimes taking it over by storm.  Where we once received paper, we now receive .tiffs, .pdfs and native files.  Where we once could organize the paper in binders or boxes, we now use a combination of tools to view, search, organize and review our much more voluminous and complex set of discovery on our computers because there is just simply too much material to print out. 

In adapting to the influx of electronic discovery, we have to realize that not all electronic discovery is created equal.  Native files such as Word documents and Excel spreadsheets come with a host of information about the file as part of its metadata (a topic we will cover later in our series), while paper that has been scanned and turned into electronic discovery in formats such as .tiff and .pdf are really just pictures of the pieces of paper we once clipped, stapled and three-hole punched.  We can now do more to those pieces of paper once they have been scanned, but keep in mind that they are just pictures of the real thing and unlike the native files we get, these pictures don’t really say as much as we want them to.

 Things to consider: 

  • Is the document searchable?  Is there associated text with the .tiffs and have the .pdfs been OCRed?  If not, should you consider having the documents OCRed?
  • Are the documents unitized?  Do you know where one document ends and the next one begins?  Is there a load file that shows you the document breaks?  If not, should you consider unitization?
  • Was there objective coding done?  Is there a load file that provides you with the objective coding?  If not, should you consider objective coding? 
  • Should you run a document inventory to get a better handle on the various file formats that may be included in the discovery?  Are there color images?  Will you need to take that into consideration when you need to print the documents?  Are there formats that require you to have the associated application in order for you to view the document or database? Are there load files included that may contain objective coding?  
  • Do you already have programs that you are currently using that can handle the viewing, organization and review of scanned paper?  Can it handle one format and not another (i.e. Adobe can handle .pdfs but not .tiffs)?  Do you need to convert your scanned paper into one format that you can handle?  If you don’t already have a program, then what types of programs should you consider?

Up next:

Part II: Searchable Scanned Documents
Can you Really Find that Needle in the Haystack?

Do Jurors and Judges Really Need to See Your Evidence?

The answer is yes, and courtroom presentation software can help you do it.

Not only do they need to see or hear it, but they need to understand, retain and recall it.  Whether we like to admit it or not, we live in an era where audiences expect a multi-media show every time they sit for a presentation.  Jurors and judges are no different, whether they are in a small town or a large city.  The question for trial lawyers becomes how do they present the facts of their case, and their client’s story, but do it in a way that grabs people’s attention?  We believe that courtroom presentation software should be an integral part of your litigation support toolbox.

While there may be a certain charm to writing with chalk on a blackboard or placing a piece of paper on an Elmo (a document camera), these options limit how a lawyer can present evidence in the courtroom.  For example, an attorney can only put a piece of paper on an Elmo if they have that piece of paper readily available, but they have no control over what part of that document the fact finder is focusing on during the evidence presentation.

A lawyer can only write so quickly, or so much, on a whiteboard/chalkboard, and the marked-up document may not get entered as an exhibit or taken back to the jury room.

With trial presentation software, an attorney can have available to present in court the critical evidence they want, as long as the documents, videos or audio files have been pre-loaded onto a laptop they plan to use at the hearing, motion or trial.

What is courtroom presentation software? It is a program that allows you to pull up a document for the jury/judge to view and blow up a word, line or paragraph on which the attorney wants the jury/judge to focus.  An example of such a program that is specifically designed for use in the courtroom is TrialDirector.  Besides the above example, TrialDirector is a media player, giving you the ability to pause a portion of a video or audio file for emphasis.

A lawyer using TrialDirector can compare documents side-by-side or point out important differences/similarities in documentary, photographic or video evidence.  TrialDirector can also be your virtual trial binder, allowing you to organize your materials for quick and easy presentation in the courtroom.  All of these techniques can be done with just a few keystrokes.

Another common presentation program that has been adapted for use in the courtroom is PowerPoint.  You should have it or something similar in your toolbox but be aware, it does not give you the same level of flexibility and access to your evidence as a courtroom presentation program such as TrialDirector.  For example, with PowerPoint, each slide must be prepared in advance with fixed text or images whereas with a courtroom presentation software, you can show any file that has been loaded into the program on the fly.

We believe that these various presentation tools should be used to enhance, not replace, an attorney’s advocacy on behalf of their clients.  But as a sign of the times, courtroom presentation software is now so commonplace that there are even presentation apps for use with iPads and other tablet PCs.

Criminal Justice Act (CJA) panel attorneys can take advantage of a special offer provided by inData and purchase a copy of TrialDirector at a discounted price.  Additionally, the Office of Defender Services offers technology related training events that specifically focus on PowerPoint and TrialDirector.  These workshops have no fees for attendance and are open to all CJA panel attorneys and Federal Defender staff.  For those of you who are interested, the next workshop will be held in Providence, Rhode Island, July 21-23 and there are still a few spaces available.  Details about the workshop and registration information can be found on fd.org.