dtSearch is a popular search and retrieval program. Here is a brief 12 minute video that demonstrates how to setup a new dtSearch index and how to run searches within an index.
As mentioned in the dtSearch Desktop post, we have been able to obtain a limited number of licenses that will be made available to CJA panel attorneys with current, active cases. To request a license go to the dtSearch Desktop post and fill out the request form on the bottom.
Note: like most litigation software programs, this program was developed for Windows-based operating systems and does not work with Macintosh operating systems.
Limited licenses of dtSearch Desktop Available for CJA Panel Attorneys
We are pleased to announce that we are able to offer a limited number of dtSearch Desktop software licenses for CJA panel attorneys with current, active cases at no cost (a $200 value).
*PLEASE NOTE: dtSearch only works on Windows-based operating systems. It will not work on Mac computers unless you are running a virtual Windows operating system.*
dtSearch is a popular search and retrieval program that can be a useful tool for searching discovery and creating brief banks. It can be helpful in viewing different file types (including non-PDF files) even if you do not have the associated program installed on your computer. dtSearch is the search engine utilized in many familiar litigation support programs including Adobe Acrobat Pro, CaseMap and Forensic Tool Kit (FTK, a computer forensic tool).
dtSearch allows users to search native files as well as scanned paper documents that are in a text searchable format. It generates a search index for each folder of materials the user designates and allows one to search the contents of an entire folder regardless of the amount of data or the number of varied file formats. As electronic discovery in federal criminal matters continues to grow in volume and in the variety of formats. dtSearch is a great resource for CJA panel attorneys faced with the daunting task of organizing and searching through their case material.
To obtain your free license of dtSearch, simply fill out the dtSearch Request Form below. Once you complete all of the fields, click on the “submit” button at the bottom of the form. This will automatically send an email with your completed form attached to Joe Wanzala, the National Litigation Support Paralegal in charge of dtSearch licensing. You will then receive an email within 5 business days with download instructions and an activation code. Each user license can be installed for that user on two machines.
You must have an active appointed case to utilize this license. If you are no longer on the panel and don’t have an active appointed case, we request you return the license to the National Litigation Support Team (NLST) by contacting Joe Wanzala so the license can be used by other CJA panel attorneys.
For technical support or if you have any questions regarding the utilization of dtSearch within your office, please contact either Alex Roberts, Joe Wanzala or Sammy Lopez. If you want to learn more about dtSearch, you can go to dtSearch.com.
When working with PDF documents you may encounter a “renderable text” error message. This message will sometimes occur when trying to make a scanned paper PDF file text searchable (also know as adding OCR to a document).
Depending on the version of Acrobat you have, the message may read something like:
“Renderable text” is typically text that has been added to an scanned paper image (like a header, footer or bates number), through a non-Acrobat program. The way this text is encoded into the page can cause Acrobat to disallow additional searchable text (OCR text).
This message can certainly be annoying and it can also be significant as it can limit your ability to run searches. In Acrobat, you will be unable to add new searchable OCR text, or improve the quality of the existing OCR, until the error is fixed.
If you’ve seen this message before, and have tried to fix the document without success, you are not alone! We spoken with a number of people over the years who have come up with some creative solutions. Though we have yet to find “one solution” that will always fix this particular error, here are a number of possible solutions (results will vary depending on the cause of the error):
Solution 1: Obtain a version of the document with OCR.
It may seem simplistic, but if you receive documents without searchable OCR, ask for it. Often the person or organization that gave it to you will want to search the files themselves and may already have a copy that has been OCR’ed. Even if the documents they give you generate “renderable text” error messages, you will still be able to search any of the existing OCR text within the files.
Solution 2: If the files are from PACER / ECF, download a new copy.
The default download settings in PACER / ECF will add “purple” headers with the case number (which will cause a “renderable text” error message). If you can find the document again in PACER / ECF, download it with the header option turned off.
Solution 3: Run “Add Tags to Document” (available in Acrobat Pro).
If you have Acrobat Pro installed there is a special “Accessibility” menu where you can run “Add Tags to Document”. For certain PDF’s, running this option will clear up the issue and allow the document OCR to be run.
Solution 4: Print the document to PDF (available in Acrobat Standard and Acrobat Pro).
If you have Acrobat installed (Standard or Pro) you’ll probably also have access to an “Acrobat PDF” virtual printer. By printing the document to this virtual printer, the new PDF that is created will often avoid having the renderable text issue.
Solution 5: “Sanitize” the document then rerun OCR (available in Acrobat Pro).
From the “Protection” menu run “Sanitize Document”. This will remove all of the document metadata including some of the rendered text that might be causing the error.
Re-run the OCR process.
Solution 6: Convert to TIFF files and back, and then re-run OCR (available in Acrobat Standard and Acrobat Pro).
Open the PDF document in Acrobat and choose “File > Save As“.
In the “Save As” dialog box, choose TIFF (*.tif, *.tiff) from the Save As Type (Windows) or Format (Mac OS) pop-up menu. Specify a location, and then click Save. Acrobat saves each page of the PDF document as a separate, sequentially numbered TIFF file.
Combine the single pages back into a multipage document and re-run the OCR process.
Solution 7: Convert to XPS file format and back, and then re-run OCR.
If your computer has the “XPS” virtual printer installed (it comes with many version of MS Office) then print the file using the “Microsoft XPS Document Writer” printer.
The XPS printer will ask you to save the file.
Convert the saved XPS file to PDF.
Re-run the OCR process on the new PDF.
Solution 8: Try running the OCR using a different program.
Adobe Acrobat Pro is one of the most popular computer software programs on the market for FDO and CJA panel attorneys. Since so much of the discovery we currently receive in criminal cases is provided in paper or scanned paper format, Acrobat Pro is an excellent tool to help you to better organize and review it.
In our team’s continued efforts to providing resource to CJA panel attorneys and FDO staff, we are creating a series of training videos. Each short video will address a specific feature in a computer software program with our first set focused on Adobe Acrobat Pro XI.
Future videos we are developing will also be posted on this blog. Make sure to check back in or sign up to subscribe to our blog to get notices of new posts by email.
These videos do not take the place of hands-on training sessions where we can get in depth about a variety of software programs and legal strategies for addressing complex cases, but it hopefully will provide you some basic background information that can help you in your cases.
With ever rising volumes of discovery data, increasingly legal teams are looking for solutions that can assist them manage the amount of data they need to review. In circumstances where significant amounts of ESI (Electronically Stored Information) and forensic images of hard drives are involved, one common method is to “De-NIST” discovery data sets. “De-NIST”ing can be a significant time and money saver and an important part of the discovery review process.
So what the heck does “De-NIST” mean?
NIST is the acronym for the National Institute of Standards and Technology (website www.nsrl.nist.gov). One of NIST’s projects is the National Software Reference Library. This project is designed to identify and collect software from various sources and create a Reference Data Set (RDS). The RDS is a collection of digital signatures of known, traceable software applications.
A digital signature is like a digital fingerprint (it is also commonly referred to as a hash value). In theory, every file has a unique hash value. If two files have the same hash value they are considered duplicates.
Most software applications are comprised of multiple files. For example: when Adobe Acrobat Reader is installed there are hundreds of standard files copied to a computer’s hard drive. All of these standard install files are the same (i.e., they have identical hash values) no matter what computer they reside on. A typical computer contains hundreds of software applications. The files associated with running these applications are not user generated and hold little evidentiary value for litigation purposes. The NIST list is a database that contains over 28 Million of these file signatures.
“De-NIST”ing is the process of identifying these files so that a decision can be made if they should be set aside or removed from a discovery database. The NIST list is compared to the file signatures of the data sets within the discovery. Any file that has a signature that matches one in the NIST list can be “De-NIST”ed (identified or removed) from the collection.
While many legal review teams expect the De-NIST process to get rid of every application or system file within a data collection it is important to note that the NIST list does not contain every single system file. Though it may not remove all of the system files, it can significantly reduce the dataset, especially when working with with copies of hard drive images.
When presented with an overwhelming river of information, trying to find relevant information can feel like you’re panning for gold. De-NIST’ing can help to identify or get rid of the much of the water, stones and muck and leave you with a much more manageable pan.
A “load file” is a special kind of file that you may encounter in sets of case related materials. While there are many different flavors of load files they all serve the same general purpose: they can be used by litigation support software to import (i.e. “load”) information about case related documents.
Document information may include:
Name and locations of image files (typically scanned paper files).
Document unitization information (i.e. document breaks).
OCR (searchable text) file names and locations.
Electronic document (ESI) file names and locations.
Extracted metadata information.
Other fielded document information.
Load files can play an import role in assisting with the setup of a case document database. When properly used, they can make the process of importing documents into litigation support applications faster and more efficient. Some programs that support the importing of load files include evidence review programs (like Summation, Concordance and IPRO) and trial presentation programs (like TrialDirector and Sanction).
Load files have different file extensions depending on the program they are designed to work with. When talking with litigation support vendors, or discussing the format of discovery with opposing counsel. It is important to recognize which load file formats work with your litigation support programs.
Some common file extensions of load files that you might encounter are:
.DII designed to work with AD Summation
.OPT designed to work with Concordance
.LFP designed to work with IPRO products
.OLL designed to work with TrialDirector
.SDT designed to work with Sanction
.DAT generic document information load file
.CSV generic document information load file
.XML new “EDRM” style load file format that works with many platforms
Many load files contain the path of image files associated with a record. They may also contain meaningful additional information about the documents. For scanned documents, this may include a bates or control number, coded document information (like document type, date, title, etc…) and information about OCR (searchable text) files that might be associated with the document. Load files for electronic documents (ESI) may also include extracted metadata (associated information about the files such as author, date created, file size, etc…).
Most load files are simple lines of text that can be read by litigation support programs. When viewed in a text program like Wordpad or MS Word we can see what the lines contain. Here is an excerpt from a sample .LFP load file (as seen in in Wordpad):
This particular load file contains information about document images. Litigation support software programs can read this file and know:
the record identifier (usually the bates number) of a document
where one document ends and another begins
where to find the scanned paper .TIF files associated with a document
There may be times when you will receive multiple types of load files within the same set of documents. Some of the files may contain the same information, but are designed to work with different database programs. When working with vendors, let them know what litigation support database programs you intend to use so that they give you compatible load files.
In the event you receive load files that are not designed for your database program, you may need to convert the file to make it compatible. Fortunately there are a few free load file conversion programs available. Two such programs are:
To find out more about how load files can best be used interact with your existing litigation support applications refer to the help and support documents of the program. Quite often, these are the best resource for describing how load files interact with the case database and will often demonstrate the load file import process.