DeMystifying De-NIST

With ever rising volumes of discovery data, increasingly legal teams are looking for solutions that can assist them manage the amount of data they need to review.  In circumstances where significant amounts of ESI (Electronically Stored Information) and forensic images of hard drives are involved, one common method is to “De-NIST” discovery data sets.  “De-NIST”ing can be a significant time and money saver and an important part of the discovery review process.

So what the heck does “De-NIST” mean?  

NIST is the acronym for the National Institute of Standards and Technology (website www.nsrl.nist.gov).  One of NIST’s projects is the National Software Reference Library.  This project is designed to identify and collect software from various sources and create a Reference Data Set (RDS).  The RDS is a collection of digital signatures of known, traceable software applications. 

A digital signature is like a digital fingerprint (it is also commonly referred to as a hash value).  In theory, every file has a unique hash value.  If two files have the same hash value they are considered duplicates.  

Most software applications are comprised of multiple files.  For example: when Adobe Acrobat Reader is installed there are hundreds of standard files copied to a computer’s hard drive.   All of these standard install files are the same (i.e., they have identical hash values) no matter what computer they reside on.  A typical computer contains hundreds of software applications.  The files associated with running these applications are not user generated and hold little evidentiary value for litigation purposes.  The NIST list is a database that contains over 28 Million of these file signatures.

De-NIST”ing is the process of identifying these files so that a decision can be made if they should be set aside or removed from a discovery database.  The NIST list is compared to the file signatures of the data sets within the discovery.  Any file that has a signature that matches one in the NIST list can be “De-NIST”ed (identified or removed) from the collection. 

While many legal review teams expect the De-NIST process to get rid of every application or system file within a data collection it is important to note that the NIST list does not contain every single system file.  Though it may not remove all of the system files, it can significantly reduce the dataset, especially when working with with copies of hard drive images. 

When presented with an overwhelming river of information, trying to find relevant information can feel like you’re panning for gold.  De-NIST’ing can help to identify or get rid of the much of the water, stones and muck and leave you with a much more manageable pan.