Acrobat Training Guide – Text Recognition

Editor’s note: this is an update on the Acrobat Training Videos – Text Recognition video post. A related post is Three types of PDFs.

Introduction

This is a brief guide on the text recognition feature in Adobe Acrobat1. OCR, which stands for Optical Character Recognition – is a process which adds an invisible text layer to scanned paper documents or screenshots to help make them text searchable. While OCR can be very helpful in terms of search, it is not perfect. The computer is interpreting pictures of letters and characters in documents and attempting to turn them into text. Sometimes, those translations are incorrect (Figures 1 and 2).

Figure 1.
Figure 2.

The quality of the OCR text depends on many factors including the accuracy of the source document, its complexity and structure, font and language variations and the sharpness of the scan. For example, a document with clear, large print font (Figure 3) will generally OCR better than a fax copy with blurry text or handwriting (Figure 4).

Figure 3.
Figure 4.

With newer iterations of Adobe Acrobat, the OCR text accuracy has improved. When working with sets of scanned paper documents that were processed with older OCR engines, some people will spot check the accuracy of the OCR by running simple searches. Time permitting, they may then choose to re-OCR the documents. This can lead to more accurate searchable text.

A good practice for dealing with scanned paper PDF documents we want to work with is to first make a copy of the documents. For example, if we received a flash drive or a download from USAfx of scanned paper PDF files it’s a good idea to first copy the files to a location on a computer or a network drive. This way we can work with the documents and add OCR text when needed, while still maintaining a set of the original files.

With a copy of one documents open, the next step would be to see if document already is already searchable. When we open a PDF file, we are looking at an image of the document. Since the OCR text layer is invisible, we will not know whether it is searchable just by looking at it. There are a few things we can do to see if OCR is present.

If we go to the ‘Edit’ menu and choose ‘Select All’, or the keyboard shortcut ‘Control A’, (Figure 5) and we get a no text characters warning message (Figure 6), this indicates that there is no searchable text. Alternatively, if we use our mouse and single click in a blank area of the page, if the entire page turns blue, it also means there is no searchable text layer.

Figure 5.
Figure 5.
Figure 6.
Figure 6.

We can also try to find a word on the page using one of the search features in Acrobat. For example, when we run a find for the word ‘memo’ we get the same no text characters warning that we got by going to the Edit menu and choosing ‘select all. (Figure 7).

Figure 7.
Figure 7.

Starting the Text Recognition Process

To add an OCR text layer to a document, go to the tools menu and click on the ‘Scan & OCR’ button (Figure 8). When you activate this tool in Acrobat an additional menu bar will appear at the top of the page. Choose the ‘In This File’ option (Figure 9). In most circumstances we will go with the ‘All pages’ default. Click on the blue ‘Recognize Text’ button to begin the process (Figure 10). A progress indicator will appear on the bottom of the bottom right-hand side as it processes each page Adobe will also automatically rotate pages, based on the optimal rotation for the text on that page (Figure 11).

Figure 8.
Figure 8.
Figure 9.
Figure 9.
Figure 10.
Figure 10.
Figure 11.
Figure 11.

While the speed at which Acrobat can OCR documents can vary depending on the complexity of the documents and the type of computer being used, a good general estimate is about 1000 pages per hour. With particularly large OCR jobs, you might want to wait until the end of the day to begin the process. Some offices have also set up a spare computer, dedicated to running various processes such as OCR, so nobody’s computer is tied up.

When the Text Recognition Process is Complete

When the OCR process is complete, we can now go back to the first page to make sure the document is now searchable. If we go back to the ‘Edit’ menu and choose ‘Select All’, the text on the document will now be highlighted in blue while the blank areas surrounding the text will remain white.  (Figure 12). A single click in the blank area no longer turns whole page blue. If we search for the word ‘Memo’ again, using the find option, we will get a set of search results with the first hit on the first page of the document highlighted in blue (Figure 13).

Figure 12.
Figure 12.
Figure 13.
Figure 13.

Since we have now changed the document by adding an OCR layer to it, save the file so we lose none of the work we have just done.

Text Recognition in Multiple Files

We can also run the OCR process across multiple documents, by going to our OCR tool menu (Figure 14) and selecting ‘Or recognize text in multiple files’ (Figure 15). This is a handy option, as we often receive batches of documents that might need to be OCR’d.

Figure 14.
Figure 14.
Figure 15.
Figure 15.

You can choose to OCR an entire set of PDF files in a folder by selecting ‘Add Folder’ (Figure 16) and then navigating to where that folder is on your computer or on the server. By default, Acrobat will include all PDFs and subfolders within the selected folder (Figure 17).

Figure 16.
Figure 16.
Figure 17.
Figure 17.

When running the OCR process on multiple files, we are prompted to choose an option as to where to save the files before you run the OCR. Most users choose to save the files in the same folder selected at the start with the original file names (Figure 18). Acrobat will also launch a progress bar for this process (Figure 19).

Figure 18.
Figure 18.
Figure 19.
Figure 19.

Estimate the page volume and run the process at a break or at the end of the day, if it is a large amount of information. The Acrobat help guide (https://helpx.adobe.com/acrobat/user-guide.html ) is a great resource if you are interested in discovering more about the OCR process.

  1. The free Adobe Acrobat Reader software does not include the ability to OCR documents. ↩︎

Acrobat Training Guide – Searching Fundamentals

Editor’s note: This is an update to the Adobe Acrobat Training Videos: Searching Fundamentals post. A related post is Three Types of PDFs.

Basic Search

This is a brief guide on the fundamentals of searching PDFs using Adobe Acrobat Pro. We will review how to run searches within a single PDF and across multiple PDFs. Searches can even be run on an entire folder of documents such as one that contains all the discovery you receive in a case. Searching in Acrobat will be useful only if the PDF files have searchable text. For scanned paper, you must make sure that they have been OCR’d first. OCR stands for ‘Optical Character Recognition’, and it is a process that reads pictures and turns them in letters and words so that they can be searched.

You can search not only the text of a document, but also any Adobe comments and bookmarks made on it. Searches can be run using either the “Find” or “Advanced Search” options. The way in which search results are displayed and what additional features may be available depend on the search tool chosen. The ‘Find’ tool can perform a quick search. With a PDF file open, display the ‘Find’ toolbar by choosing ‘Find’ from the main menu (Figure 1). There is also a ‘Find’ toolbar in the upper right-hand corner of the document which can be activated by clicking on the magnifying glass icon or by pressing ‘Control+F’ on your keyboard (Figure 2).

Figure 1.
Figure 1.
Figure 2.
Figure 2.

To perform a find, type a search term – for example, the term ‘Memo’. Acrobat will provide a preview of the number of hits for that word in document (Figure 3). After you hit enter on your keyboard, the search results will be shown highlighted in blue (Figure 4). If the term appears multiple times within a document, we can use the ‘Next’ and ‘Previous’ buttons to move from hit to hit. As we navigate through the search results, note that Acrobat highlights not only the word ‘Memo’ each time it appears, but also highlights any word which includes the letters ‘M-E-M-O’, such as ‘memoranda’ and ‘memorandum’.

Figure 3.
Figure 3.
Figure 4.
Figure 4.

There is a drop-down button next to the search term where we can select ‘Whole Words Only’, ‘Case Sensitive’, or choose to include bookmarks or comments (Figure 5). We will go over these features in greater detail when we look at the ‘Advanced Search’ tool below.

Figure 5.
Figure 5.

If we run a new find for the term ‘Xanadu’, a message appears letting us know that no hits were found ‘No results found’ (Figure 6). We will only get this message if the document is searchable and the word appears nowhere in the document. But, if we get a scanned alert page message, Acrobat is letting us know there is no searchable text associated with the document. For you to run your search, you will first have to OCR the document. If you need guidance on how to OCR the document, refer to the Acrobat OCR tutorial on this website.

Figure 6.
Figure 6.

Searching Comments

To search through just the comments, we can use the find tool in the comments list menu. To access this feature, select the ‘Comments’ icon in the upper right corner of the document (Figure 7). In this particular example we have 48 comments (Figure 8). Entering the term memo in the ‘Search’ box filters the list down to 4 comments and highlights the results (Figure 9). You can also sort and filter the list based on certain criteria such as comment author and comment type. To access this feature, click on the ellipses icon to the right of the search tool. (Figure 10). This opens an ‘Options’ menu. The ‘sort comments’ menu is the first level down is (Figure 11).

Figure 7.
Figure 7.
Figure 8.
Figure 8.
Figure 9.
Figure 9.
Figure 10.
Figure 10.
Figure 11.
Figure 11.

Advanced Search

While the find tool is quick, easy, and useful, the ‘Advanced Search’ tool has more features and is the preferred means of searching by many people. To open the advanced search window, select ‘Advanced Search’ from the main menu, or use the ‘Shift+Control+F’ keyboard shortcut (Figure 12). Acrobat will launch a new ‘Advanced Search’ dialog box (Figure 13). To automatically adjust the sizing of this window to fit nicely alongside the one showing your document click on the ‘Arrange Windows’ button.

Figure 12.
Figure 12.
Figure 13.
Figure 13.

When we run an Advanced Search for the term ‘Memo’, Acrobat generates a list of the results with some context. We can navigate between the results by clicking on them. Acrobat will go to the page and highlight the result in blue (Figure 14).

Figure 14.
Figure 14.

If we click on ‘New Search’ we can now either re-run the ‘Memo’ search or type in a new search term. This time, before we click the search button, let’s consider some of the additional features that we saw earlier when we used the find tool (see Figure 5 above). The ‘Whole Words Only’ and ‘Case Sensitive’ tools will limit the search based on the criteria selected (Figure 15).

Figure 15.
Figure 15.

When we re-run the search now, with these options checked, we will only get ‘Memo’ with a capital ‘M’ because we typed in the term in the search box with that exact capitalization. Other words containing the letters ‘Memo’ like ‘Memorandum’ are excluded because we also limited our search to only the whole word, ‘Memo’ and not any words containing those letters (Figure 16).

Figure 16.
Figure 16.

While ‘Case Sensitive’ and ‘Whole words only’ will restrict the search, ‘Include Bookmarks’ and ‘Include Comments’ will expand it. When we run the ‘Memo’ search again with these options selected, any comments or bookmarks with the term will be included in the list of results (Figure 17). We can tell if the result is a comment or a bookmark by looking at the icon next to it. Just like with text results, we can navigate to a bookmark or comment result by clicking on it (Figure 18). These features can be used in various combinations to further define your search.

Figure 17.
Figure 17.
Figure 18.
Figure 18.

If we click on ‘New Search’ again, options we previously selected are still marked (Figure 19). Acrobat retains the selections you make in the ‘Advanced Search’ window until you change them. Review these options before each search, as the number and type of search results can vary greatly depending on what options are marked.

Figure 19.
Figure 19.

Advanced Search – Multiple PDFs

Advanced search also enables you to look for a search term in multiple PDFs. This is helpful as often we receive multiple files we want to search through. Without this option, each file would need to be opened and searched separately. From the ‘Advanced Search’ window, select ‘All PDF Documents in’ (Figure 20) and then use the down arrow to choose a location. There are ‘Desktop’, ‘My Documents’ options and drive letter options, as well as a ‘Browse for Location’ option. We recommend that you choose the ‘Browse for Location’ option and navigate to a specific folder. This option will include the PDF files within the folder and any subfolders in that location (Figure 21).

Figure 20.
Figure 20.
Figure 21.
Figure 21.

Let’s choose a location where we have our discovery materials and run a search for ‘Memo’ again (Figure 22). The results appear in page order, nested under the name of each document (Figure 23). You can expand or collapse the list of results in a specific file by alternately clicking on the small arrow next to each result. As before, each search result includes some context (Figure 24).

Figure 22.
Figure 22.
Figure 23.
Figure 23.
Figure 24.
Figure 24.

This time, when we click on a hit on the results list Acrobat will open the document, showing exactly where the search term appears (Figure 25).

Figure 25.
Figure 25.

Next to the ‘New Search’ button at the top, there is a ‘Save results to file’ button (Figure 26). Selecting this button allows us to create a report in either PDF or CSV format. We recommend saving the results to PDF as this creates a nice summary of the results with links to the documents (Figure 27).

Figure 26.
Figure 26.
Figure 27.
Figure 27.

Electronic Exhibit Sticker

Preparing exhibits for trial or court hearings, though not glamorous, is an essential task in the practice of courtroom litigation. Depending on the volume and type of exhibits, this necessary task can quickly turn tedious if you must print each exhibit, affix a physical sticker, fill out the exhibit and case information by hand, then scan and submit the stickered exhibit. In the heat of trial where last minute changes take place frequently, it is easy to make mistakes. However, with the right type of technology, such as Adobe Acrobat Pro (or Standard), this process can be done more smoothly, help reduce opportunities for making errors, and done more quickly than the old school method of stickers and paper  If you have Adobe Acrobat*, we suggest considering using digital (electronic) exhibit stickers for your next case.

*Acrobat Standard or Pro, not the free “Reader” version.

This post will walk you through how you can create digital exhibits on your own, including the process of installing a sticker that takes the form of a custom Acrobat stamp. The stamp will allow you to quickly fill in the exhibit and case numbers for your case, and will automatically remember your previous entries the next time you use it.

First, follow the instructions below to install the electronic exhibit sticker.

Installation

  1. Download and copy the exhibit_stickers.pdf file to a location that is easily accessible, such as your Desktop. (NOTE: You can delete this PDF file once we are finished with the installation.)
  1. Open Acrobat and press CTRL-K to open the Preferences menu. Scroll down on the left to “Security (Enhanced)”. Click the “Add File” button, which will open a file explorer window.
  1. Type %appdata% into the address bar and press enter.
  1. This will open a new folder.  Open the “Adobe” folder, then the “Acrobat” folder. You may see folders for the different versions that have been installed like a “2017”, “2020” or a “DC” folder. Open the “DC” folder if you have that, or else the highest folder year you have. Open the “Stamps” folder. Find the “exhibit_stamps.pdf” file you saved and drag or copy and paste it into the Stamps folder. Select the file and click “Open.”
  1. This will take you back to the Preferences screen. Verify that exhibit_stamps.pdf is listed inside the box. If the file is there, click “OK”. Then close out of all Acrobat windows.

Usage

  1. Open the PDF that needs an exhibit sticker. Select the “Comment” tool from the list along the right side of the screen.
  1. This will open a new toolbar. Click on the Stamp tool icon, navigate to the “Exhibit Sticker” menu, then click on the Exhibit sticker image.
  1. The first time you use the sticker, it will pop up this window. Check “Don’t show again” and click “Complete.” There is no need to enter any information.
  1. Your cursor will now become a floating exhibit sticker. Click where you would like to place the sticker. Do not worry if the initial placement is not perfect; you can move the sticker to a different part of the page and even resize the sticker after you have placed it.
  1. When you click to place the stamp, a window will pop up asking you to enter an Exhibit Number. Enter the Exhibit number in the box and press OK.
  1. Next, a window will pop up asking you for a Case number. Enter the Case number and press OK.
  1. This will place an exhibit sticker on your PDF that contains the Exhibit Number and Case Number. You can move and resize the sticker if needed. If you need remove or change any of the information on the sticker, you can right click on the sticker, select “Delete” and create a new sticker.
  1. To permanently affix the sticker to the document, you will need to print the document to a new PDF. Go to the File menu and select Print. Now change your printer to “Adobe PDF”, change the “Comments & Forms” selection to “Document and Stamps”, then press print and save your new copy to the location of your choosing.
  1. That’s it. You will now have a permanently stamped PDF document. The next time you want to stamp a document, Acrobat will pre-fill your last enter Exhibit Number and Case Number, so it will be easier to keep track of your exhibits if you are marking multiple documents in one sitting, and you will not have to re-enter the case number each time.

If you need any assistance with installation, you can contact me at carl_adams@fd.org.

Working with Email in Discovery: Processing Options and Review Workflows

Introduction

Technologies that allow for easier review of ediscovery in native format have become more affordable and accessible. Working with files in native format has several advantages including avoiding loss of potentially relevant information, access to metadata and better searchability. Email is one of the most common of the native formats produced in discovery. This article will explore some approaches for processing email and identify a number of low-cost of tools that can assist. (This article deals with the processing but not the substantive review of emails for case analysis – for this you should consider other tools such as CaseMap, or – for larger collections of emails – review platforms such as Casepoint or IPRO.)

The tools and approaches you select will depend on a combination of three factors: (1) volume, (2) format(s) and (3) the defense team goals. While a single tool might facilitate a discreet goal, more involved goals may require different approaches with a combination of tools. These scenarios can be ends in themselves or phases in an overall workflow. This article does not try to anticipate every possible situation that might arise but will explore a few common scenarios.

Many electronic file formats produced in the course discovery like Acrobat, Excel and Word files are generally accessible via standard software available on most computers. However, email file formats like MSG, EML, PST, and MBOX files present more of a challenge as often the recipient may not know how to access these files. 

Below is a quick overview of some of the most common email file formats encountered in eDiscovery that will be discussed in this article:

  • MSG: A Microsoft format for single emails. Often associated with the Microsoft Outlook email client.
  • PST: A Microsoft format for a collection of emails (as well as other potential items including: Calendars, Contacts, Notes and Tasks). Often associated with the Microsoft Outlook email client.
  • EML: Email format for single emails used by many email clients including Novell GroupWise, Lotus notes, Windows Mail, Mozilla Thunderbird, and Postbox.
  • MBOX: Email format for a collection of emails (as well as other potential items including: Calendars, Contacts, Notes and Tasks) used by many email clients including Novell GroupWise, Lotus Notes, Windows Mail, Mozilla Thunderbird, and Postbox.

All four formats are typically received in discovery and subpoena returns. Google Takeout, a service offered by Google which allows you to download your email, will produce emails in the MBOX format.

Working with these email formats consists of understanding which tool is compatible with which file format, and which tool or set of tools will most effectively allow you to achieve your goals. Below is a table that maps out some of various tools available in terms of which file formats they are able to process, their functionality and cost. Before using any of these tools, make sure to work with a copy of the data as opposed to the original.

SoftwareCompatible FormatsCostFunctionality
Mozilla Thunderbird with the Import Export Tools add-onEML, MBOXfreeView emails, convert to EML, HTML, MBOX and PDF (without attachments)
Mbox ViewerEML, MBOXfreeView emails, convert to HTML or PDF (without attachments)
PSTViewer ProMSG, EML, PST, MBOX$129View emails, convert to multiple formats including EML, HTML, MBOX and PDF (includes advanced PDF attachment image options)
MS Outlook  MSG, EML, PST$159 or $69.99 per yearView emails, export to MSG, PST and PDF (requires Acrobat integration)
Aid4MailMSG, EML, PST, MBOX$299 per yearConvert email to multiple formats including MSG, HTML, EML, PST, MBOX and PDF
dtsearchMSG, EML, PST, MBOX$199 or *free
Search and view results in email viewer panel (no conversion or export options)

*For information about a free license of dtSearch available to CJA Panel Attorneys see: nlsblog.org/2014/03/25/dtsearch-desktop

This article will discuss demonstrate how to work with emails in terms of a series of discreet tasks including:

  1. Generate a list of emails to review.
  2. Viewing emails.
  3. Search, tag, and convert emails.
  4. Working with email attachments.

1. Generating a list of emails for review.
An initial task at the outset of a case might be to generate an index to facilitate early case assessment. Some programs, like PstViewer Pro, will work with many formats while other programs, like Mbox Viewer, work with a more limited number of formats.

  • Example 1 – Generating a list using Mbox Viewer:
    Mbox Viewer is a free tool that allows you to preview emails and generate a list of emails by simply selecting messages in the viewer, doing a right click and selecting print to CSV, then selecting which fields you would like to include in the spreadsheet (Figure 1-1).
Figure 1-1
  • The resulting CSV file contains a table that can be opened in Excel or imported into other programs (Figure 1-2).
Figure 1-2

2. Viewing emails.
While a list will provide you with a high-level overview of the emails you have in terms of subject matter, players involved and so forth, a closer review will require a different approach. MS Outlook, Mbox Viewer and Mozilla Thunderbird are all tools which can be utilized for this purpose.

  • Example 2.1 – Viewing emails received in PST format using MS Outlook:
    Within Outlook open the ‘File’ menu, select the ‘Open & Export’ button, then ‘Open Outlook Data File’. Navigate to the folder containing the PST file (Figure 2-1) and select the file to import. Outlook will create a folder within the ‘Personal folders’ from where you can conduct a review of the files.
Figure 2-1
  • Example 2.2 – Viewing emails received in MBOX format using Mozilla Thunderbird with the Import Export Tools add-on:
    The free ‘Import Export Tools’ add-on available for Mozilla Thunderbird allows for the import and viewing of MBOX files. After the add-on has been installed, right click on ‘local folders’, then choose ‘Import mbox file’ from the ‘ImportExportTools NG’ menu and navigate to the folder containing the MBOX file (Figure 2-2). This will copy the MBOX file into Thunderbird’s ‘Local Folders’ where, similar to Outlook, you can conduct a review of the emails within.
Figure 2-2

3. Search, tag, and convert emails
The approaches discussed in the two previous sections can be useful when you simply want to gain a high-level view of the emails, or take a closer look at particular emails in a smaller collection. However, when you are working with large volumes of emails, manual review becomes impractical and inefficient, and taking advantage of the search and tag functionality of the available tools is a better approach.

  • Example 3 – Searching, tagging and exporting within MS Outlook:
    Outlook can be utilized to conduct key word searches, and relevant files can be tagged exported as either MSG or PDF files (using the Acrobat integration that is included with licensed copies of Acrobat Standard and Pro). To tag an email, right click and select ‘Categories’ then select a color coded tag (Figure 3-1). You can also customize the tags using the ‘New Category’ option within the ‘Category’ dialog box (Figure 3-2).
Figure 3-1
Figure 3-2
  • You can then filter and tag a selection of emails (Figure 3-3) and save them to a folder as either individual MSG files or a new PST file. If you have a licensed version of Adobe Acrobat, there integration menu within Outlook can be used to convert messages into individual PDF’s or a combined ‘PDF Portfolio’ (Figure 3-4).
Figure 3-3
Figure 3-4
  • When choosing an export format, be aware of the limitations of the different conversion formats. The HTML and PDF export formats typically will not include the complete email metadata. Email header information that may include important information like IP addresses used may be lost during conversion. Export formats including the MSG, EML, MBOX and PST retain much more of the original email metadata.

4. Working with email attachments.
Emails invariably have attachments, which, in addition to the body of the email can contain substantive relevant information. The programs discussed in this post vary greatly with how attachments are handled during format conversion. Be aware that some of the programs are not able to include the attachments when exporting to PDF. While PDFs are generally easier to add bates stamps to or turn into exhibits not all programs include the attachments..

  • Example 4.1 – Exporting email with attachments using Mozilla Thunderbird with the Import Export Tools add-on:
    Thunderbird offers several export options including the ability to batch export relevant emails when using the Import Export Tools add-on. It does not have the ability to embed or append attachments when exporting messages to PDF, however it does allow for emails to be exported to the EML format (with attachments embedded) as well as an HTML format, which will include links to exported copies of the attachments (Figure 4-1).
Figure 4-1
  • Example 4.2 – Exporting email with attachments using PSTViewer Pro:
    PSTViewer Pro is yet another option for format conversion, and is a great tool to use in conjunction with tools like Thunderbird or Outlook. It can convert to many formats and includes some advanced PDF conversion options. When converting to PDF, attachments can either be embedded or “imaged” (Figure 4-2). The “imaged” option will convert supported attachments into PDF pages and appended them to the PDF version of the email (Figure 4-3).
Figure 4-2
Figure 4-3

Conclusion

As shown in this article there are a multiplicity of tools available to work with emails that are not universally compatible with all email formats and do not have the same functionality. This requires careful thought about how to leverage and integrate the tools. The best path forward through this thicket is to know what your goals are before you select your tool. Defining your goal early will help you select which tool or combination of tools you should use to develop an effective workflow that matches both the set of data you are working with and the needs of your case.

Three Types of PDFs

Acrobat

PDFs (portable document format files) are a common file format in federal criminal discovery. But are all PDFs created equal? As you all have experienced, the answer is no, they are not.

Think about PDFs in three distinct categories:

  1. True PDFs;
  2. Image-based PDFs; and
  3. Made-searchable PDFs.

For discovery review, these distinctions are important because it impacts whether the PDF is searchable and the accuracy of your text searches within the PDF file. With voluminous discovery, the ability to search and review PDFs is critical for organizing and reviewing it.

  • True PDFs (also known as text-based or digitally created PDFs). These PDFs are created using software such as Microsoft Word, Excel, or using the “print to PDF” function in those programs. They consist of both text and images. We should think about these PDFs having two layers – one layer is the image and a second layer is the text. The image layer shows what the document will look like if it is printed to paper. The text layer is searchable text that is carried over from the original Word file into the new PDF file (the technical term for this layer is “extracted text”). There is no need to make it searchable and the new PDF will have the same text as the original Word file. An example of True PDFs that federal defenders and CJA panel attorneys will be familiar with are the pleadings filed in CM/ECF. The pleading is originally created in Word, but then the attorney either saves it as PDF or prints to PDF and they file that PDF document with the court. Using either process, there is now a PDF file created with an image layer plus text layer. In terms of usability, this is the best type of PDF to receive in discovery as it will have the closest to text searchability of the original file. Click here to see an example of a True PDF.
  • Image-based PDFs (also known as image-only PDFs). Image-based PDFs are typically created through scanning paper in a copier, taking photographs or taking screenshots. To a computer, they are images. Though we humans can see text in the image, the file only consists of the image layer but not the searchable text layer that True PDFs contain. As a result, we cannot use a computer to search the text we see in the image as that text layer is missing. There are times when discovery is produced, it will be in an image-based PDF format. When you come across image-based PDFs, ask the U.S. Attorney’s Office in what format was that file originally. Second, ask if they have it in a searchable format and specifically if they have it in a digitally created, True, Text-based PDF format. They may not, as they often receive PDFs from other sources before they provide them to you, but you will want to know what is the format in which they have it in, and what is the original format of the file (as far as they know). Click here to see an example of an Image-based PDF.
  • Made-searchable PDFs (also known as “OCRed” PDFs). Image-based PDFs can be made text searchable by applying optical character recognition (OCR). CJA panel attorneys frequently use Adobe Acrobat Pro (or other PDF editor software) to make image-based PDFs searchable. During the OCR process, the software program interprets each character on the image as text and adds a text layer to the image layer. Made-searchable PDFs are like True PDFs, but the searchability of the OCRed document will depend on the quality of the image, or the recognizability of the writing. They are often not 100% accurate when you do keyword searches of the text. Click here to see an example of a Made-searchable PDF.

The ESI Protocol (formally known as the Recommendations for Electronically Stored Information (ESI) Discovery Production in Federal Criminal Cases) noted the limitations of OCR process on scanned paper.

“Generally speaking, OCR does not handle handwritten text or text in graphics well. OCR conversion rates can range from 50 to 98% accuracy depending on the underlying document. A full page of text is estimated to contain 2,000 characters, so OCR software with even 90% accuracy would create a page of text with approximately 200 errors.”

People ask how accurate software programs are in the OCR conversion. That is important, but the biggest factor for how searchable your OCR PDF will become is the underlying quality of the scanned image. A clean copy of a pleading will have high accuracy; a twice photocopied school paper record from the 1950s will be less accurate.

A quick way to see what the quality of the text is compared to the image is to select the text in question in a PDF file (you can use Control + A in Windows or Command + A in Mac to copy all the text on a page), and then copy and paste the text into a Word document. Put the two files side by side and visually compare them.

Side by Side

Acrobat DC New Features

All of you use Adobe Acrobat on a daily basis.  Whether it is Adobe Acrobat Reader, Standard or Pro, it is an excellent tool for legal professionals for everything from saving pleadings to file with the court’s case management/electronic case file system to reviewing discovery.  Some of you have been using Acrobat for a while and know that Adobe comes out with new versions every couple of years.  The latest version of Acrobat stopped using the number of release to distinguish a new version (like Adobe Acrobat XI), but now calls itself DC, which stands for Document Cloud, and labels the version by the year of the release (Adobe Acrobat DC 2016 the most recent version).  Like many other software companies, Adobe is moving to a cloud based service giving users the option of working on multiple devices seamlessly if they choose to store their files online.  Though designed for cloud use, users do not have to store their documents remotely, and they can continue using Acrobat DC as a desktop program as they always have.

Acrobat DC has a new look compared to previous versions, has been designed to be tablet and cell phone friendly, and gives users the ability to work on a document from different devices seamlessly. The addition of a user friendly tabbed tool bar makes switching from one document to another that much easier.

The “Home” tab shows the most recent files you have worked with.  You can also search for a file in the search bar, open a file by navigating to it by clicking on “My Computer” or going to the File Menu and selecting → Open.

9-20-2016 1-29-00 PM.jpg

Once you open a document, the “Document” tab appears at the top of the screen, allowing you to easily navigate from the Document to the Tool Center to the Home page.

9-20-2016 1-31-49 PM.jpg

The “Tools” tab, otherwise known as the DC Tool Center centralizes all the features of Acrobat in one place for easy access. Now you can quickly find the tool you need without having to remember which  menu in the tools section to navigate to.

9-20-2016 1-44-24 PM.jpg

The “Search Tools” option in DC is intuitive and easy to use. If you want to OCR a document, type OCR in the “Search Tools” section of the Tool Center and all the toolsets related to recognizing text will appear.

9-20-2016 1-45-13 PM.jpg

The tool pane that users see when looking at a document can be customized. You can add a tool to the tool pane by selecting “Add Shortcut” from the Tool Center or by right-clicking in the Tool Pane when searching for a tool and adding it there.

image5

When Tool Groups are opened, they are automatically pinned to the top of the screen. The Tool Group stays open until you close it or open another tool.

9-20-2016 1-47-39 PM.jpg

DC gives you multiple ways of accessing the tools you are looking for and then quickly going back to working with your documents.

image8.png

The new tabbed tool bar is just one feature of Acrobat DC that makes upgrading worthwhile.  More features will be highlighted in upcoming posts so stay tuned.

Adobe Acrobat: “Renderable Text”

When working with PDF documents you may encounter a “renderable text” error message.  This message will sometimes occur when trying to make a scanned paper PDF file text searchable (also know as adding OCR to a document).

error messageDepending on the version of Acrobat you have, the message may read something like:

“Renderable text” is typically text that has been added to an scanned paper image (like a header, footer or bates number), through a non-Acrobat program.  The way this text is encoded into the page can cause Acrobat to disallow additional searchable text (OCR text).

This message can certainly be annoying and it can also be significant as it can limit your ability to run searches.  In Acrobat, you will be unable to add new searchable OCR text, or improve the quality of the existing OCR, until the error is fixed.

If you’ve seen this message before, and have tried to fix the document without success, you are not alone!  We spoken with a number of people over the years who have come up with some creative solutions.  Though we have yet to find “one solution” that will always fix this particular error, here are a number of possible solutions (results will vary depending on the cause of the error):

Solution 1: Obtain a version of the document with OCR.

  • It may seem simplistic, but if you receive documents without searchable OCR, ask for it.  Often the person or organization that gave it to you will want to search the files themselves and may already have a copy that has been OCR’ed.  Even if the documents they give you generate “renderable text” error messages, you will still be able to search any of the existing OCR text within the files.

Solution 2: If the files are from PACER / ECF, download a new copy.

  • The default download settings in PACER / ECF will add “purple” headers with the case number (which will cause a “renderable text” error message).  If you can find the document again in PACER / ECF, download it with the header option turned off.

Solution 3: Run “Add Tags to Document” (available in Acrobat Pro).
accessibility menu

  • If you have Acrobat Pro installed there is a special “Accessibility” menu where you can run “Add Tags to Document”.  For certain PDF’s, running this option will clear up the issue and allow the document OCR to be run.

Solution 4: Print the document to PDF (available in Acrobat Standard and Acrobat Pro).

  • If you have Acrobat installed (Standard or Pro) you’ll probably also have access to an “Acrobat PDF” virtual printer.  By printing the document to this virtual printer, the new PDF that is created will often avoid having the renderable text issue.

Solution 5: “Sanitize” the document then rerun OCR (available in Acrobat Pro).

  • From the “Protection” menu run “Sanitize Document”.  This will remove all of the document metadata including some of the rendered text that might be causing the error.
  • Re-run the OCR process.

Solution 6: Convert to TIFF files and back, and then re-run OCR (available in Acrobat Standard and Acrobat Pro).

  • Open the PDF document in Acrobat and choose “File > Save As“.
  • In the “Save As” dialog box, choose TIFF (*.tif, *.tiff) from the Save As Type (Windows) or Format (Mac OS) pop-up menu. Specify a location, and then click Save.  Acrobat saves each page of the PDF document as a separate, sequentially numbered TIFF file.
  • Combine the single pages back into a multipage document and re-run the OCR process.

Solution 7: Convert to XPS file format and back, and then re-run OCR.

  • If your computer has the “XPS” virtual printer installed (it comes with many version of MS Office) then print the file using the “Microsoft XPS Document Writer” printer.
    • The XPS printer will ask you to save the file.
    • Convert the saved XPS file to PDF.
    • Re-run the OCR process on the new PDF.

Solution 8: Try running the OCR using a different program.

Adobe Acrobat Training Videos: Searching Fundamentals

Editor’s note: there is an updated Acrobat Training Guide – Searching Fundamentals post.

Previous video – Text Recognition

Adobe Acrobat Pro is one of the most popular computer software programs on the market for FDO and CJA panel attorneys.  Since so much of the discovery we currently receive in criminal cases is provided in paper or scanned paper format, Acrobat Pro is an excellent tool to help you to better organize and review it.

In our team’s continued efforts to providing resource to CJA panel attorneys and FDO staff, we are creating a series of training videos. Each short video will address a specific feature in a computer software program with our first set focused on Adobe Acrobat Pro XI.

Future videos we are developing will also be posted on this blog.  Make sure to check back in or sign up to subscribe to our blog to get notices of new posts by email.

These videos do not take the place of hands-on training sessions where we can get in depth about a variety of software programs and legal strategies for addressing complex cases, but it hopefully will provide you some basic background information that can help you in your cases.

Adobe Acrobat Training Videos: Text Recognition

Editor’s note: there is an updated Acrobat Training Guide – Text Recognition post.

Next Video – Searching Fundamentals

Adobe Acrobat Pro is one of the most popular computer software programs on the market for FDO and CJA panel attorneys.  Since so much of the discovery we currently receive in criminal cases is provided in paper or scanned paper format, Acrobat Pro is an excellent tool to help you to better organize and review it.

In our team’s continued efforts to providing resource to CJA panel attorneys and FDO staff, we are creating a series of training videos. Each short video will address a specific feature in a computer software program with our first set focused on Adobe Acrobat Pro XI.

These videos do not take the place of hands-on training sessions where we can get in depth about a variety of software programs and legal strategies for addressing complex cases, but it hopefully will provide you some basic background information that can help you in your cases.

The first video (created by Kelly Scribner and Alex Roberts) gives key information to consider when using OCR text recognition with Adobe Acrobat Pro for scanned paper. Though much has been written about the incredible functionality available with Adobe Acrobat Pro, this short seven minute demonstration focuses on points that we think are most important for you to consider when using OCR in Acrobat Pro.

Future videos we are developing will also be posted on this blog.  Make sure to check back in or sign up to subscribe to our blog to get notices of new posts by email.

.