Three Types of PDFs

Featured

Acrobat

PDFs (portable document format files) are a common file format in federal criminal discovery. But are all PDFs created equal? As you all have experienced, the answer is no, they are not.

Think about PDFs in three distinct categories:

  1. True PDFs;
  2. Image-based PDFs; and
  3. Made-searchable PDFs.

For discovery review, these distinctions are important because it impacts whether the PDF is searchable and the accuracy of your text searches within the PDF file. With voluminous discovery, the ability to search and review PDFs is critical for organizing and reviewing it.

  • True PDFs (also known as text-based or digitally created PDFs). These PDFs are created using software such as Microsoft Word, Excel, or using the “print to PDF” function in those programs. They consist of both text and images. We should think about these PDFs having two layers – one layer is the image and a second layer is the text. The image layer shows what the document will look like if it is printed to paper. The text layer is searchable text that is carried over from the original Word file into the new PDF file (the technical term for this layer is “extracted text”). There is no need to make it searchable and the new PDF will have the same text as the original Word file. An example of True PDFs that federal defenders and CJA panel attorneys will be familiar with are the pleadings filed in CM/ECF. The pleading is originally created in Word, but then the attorney either saves it as PDF or prints to PDF and they file that PDF document with the court. Using either process, there is now a PDF file created with an image layer plus text layer. In terms of usability, this is the best type of PDF to receive in discovery as it will have the closest to text searchability of the original file. Click here to see an example of a True PDF.
  • Image-based PDFs (also known as image-only PDFs). Image-based PDFs are typically created through scanning paper in a copier, taking photographs or taking screenshots. To a computer, they are images. Though we humans can see text in the image, the file only consists of the image layer but not the searchable text layer that True PDFs contain. As a result, we cannot use a computer to search the text we see in the image as that text layer is missing. There are times when discovery is produced, it will be in an image-based PDF format. When you come across image-based PDFs, ask the U.S. Attorney’s Office in what format was that file originally. Second, ask if they have it in a searchable format and specifically if they have it in a digitally created, True, Text-based PDF format. They may not, as they often receive PDFs from other sources before they provide them to you, but you will want to know what is the format in which they have it in, and what is the original format of the file (as far as they know). Click here to see an example of an Image-based PDF.
  • Made-searchable PDFs (also known as “OCRed” PDFs). Image-based PDFs can be made text searchable by applying optical character recognition (OCR). CJA panel attorneys frequently use Adobe Acrobat Pro (or other PDF editor software) to make image-based PDFs searchable. During the OCR process, the software program interprets each character on the image as text and adds a text layer to the image layer. Made-searchable PDFs are like True PDFs, but the searchability of the OCRed document will depend on the quality of the image, or the recognizability of the writing. They are often not 100% accurate when you do keyword searches of the text. Click here to see an example of a Made-searchable PDF.

The ESI Protocol (formally known as the Recommendations for Electronically Stored Information (ESI) Discovery Production in Federal Criminal Cases) noted the limitations of OCR process on scanned paper.

“Generally speaking, OCR does not handle handwritten text or text in graphics well. OCR conversion rates can range from 50 to 98% accuracy depending on the underlying document. A full page of text is estimated to contain 2,000 characters, so OCR software with even 90% accuracy would create a page of text with approximately 200 errors.”

People ask how accurate software programs are in the OCR conversion. That is important, but the biggest factor for how searchable your OCR PDF will become is the underlying quality of the scanned image. A clean copy of a pleading will have high accuracy; a twice photocopied school paper record from the 1950s will be less accurate.

A quick way to see what the quality of the text is compared to the image is to select the text in question in a PDF file (you can use Control + A in Windows or Command + A in Mac to copy all the text on a page), and then copy and paste the text into a Word document. Put the two files side by side and visually compare them.

Side by Side

Why You Should Consider a Windows Computer and Laptop Buying Advice

W10

Why do we recommend having a Windows computer for CJA panel attorneys?

One of the great modern-day debates is Windows versus Apple. Like college football rivalries (think Alabama versus Auburn or UCLA versus USC), this discussion can generate intense emotions on both sides of the aisle. Add into the mix the introduction of Chromebooks (using a Chrome OS operating system), and it can be difficult for CJA panel attorneys to decide what to use in their practice.

For this conversation, let’s talk about laptops. When talking to people outside of the federal criminal defense world, we would usually say choosing a laptop depends on personal preference. You should pick the laptop that makes sense to you and allows you to be most productive. If you find you are more productive with a Mac, that’s great. People may be drawn to one operating system or the other for any number of reasons. Typically, the most important factor in choosing an operating system is which one you have used the most.  The mechanics of how that system functions will seem more intuitive to you, because you have years of experience using it.

However, for federal criminal cases, we suggest having a Windows machine available to you.

Why?

Three reasons:

  1. The Department of Justice, as well as most law enforcement agencies, use Windows computers. The systems they use to manage evidence and electronically stored information (ESI) will, by default, work on Windows machines. As a result, when they produce discovery to the defense, it will work (usually) on Windows machines.
  2. Several important software programs and digital forensics programs do not work on Macs. Examples include dtSearch, CaseMap, Cellebrite Reader (a free viewer that can speed up review of cellphone dumps) and FTK Imager (a free tool to look at computer images the government seized, so that you can see what the computer looked like to the person who used it). Now you may not need to use these tools (there are work arounds or alternatives), but it is a limitation. In addition, while many file formats can be opened on either Windows or Apple machines, such as Word documents, PDFs and PowerPoint files, there are other file types that do not work natively on Macs. For example, certain proprietary audio and video files can only be played on applications that work in Windows. Now that all discovery being provided by the U.S. Attorney’s Office is encrypted in transit, they often use tools designed to function on Windows machines and not Macs. Of course, you can try to work it out with the government, so you receive something that is Mac-friendly (and many times they will be accommodating), but it is not their default procedure.
  3. There are other costs associated with Macs. For one, PCs are often cheaper than their Mac counterparts.  Additionally, programs offered for a discount to CJA panel lawyers by the Defender Services program typically are Windows based.

Does this mean we are saying you should abandon your Mac? No. Plenty of us use both Windows and Macintosh computers at work or at home.  What we are saying is that you should consider having a Windows computer available to you to assist you in your CJA cases, as it can save you time and money in the long run.

Which laptop should I buy?

When it comes to buying a Windows laptop, there are hundreds of options.  The following minimum criteria should be considered when purchasing a new laptop:

  • 12.5 to 14-inch size screen – typically a good balance between usability and portability. This assessment is something to consider. If you are going to be mobile, go on the smaller side. If you are going to more stationary, consider the larger screen;
  • At least a Core i5 CPU;
  • At least 8 gigabytes (GB) of RAM;
  • Screen resolution of 1920 x 1080;
  • At least 500 SSD (solid state drive);
  • 8+ hours batter life;
  • Windows Professional – which gives you Bitlocker, an easy way to encrypt files and folders.

If you can afford to spend a little more, adding to these minimum specs options can result in better performance. For myself, I like to have at least a machine with Core i7 CPU, 16 gigabytes of RAM. Many of our colleagues have found that if they have a more robust machine, problems they had scrolling through large PDF files or viewing proprietary video files in their older, less powerful machines went away. However, price is always the top issue so shop around and find what works for you and your budget.