ChatGPT: Implications for Criminal Defense Litigation Practice

Featured

eDiscovery, or electronic discovery, is the process of identifying, collecting, and analyzing electronically stored information (ESI) in order to be used as evidence in legal cases. This process can be time-consuming and costly, as it often involves manually reviewing large amounts of data. However, advances in artificial intelligence (A.I.) have opened up new opportunities for streamlining the eDiscovery process. One such technology is ChatGPT, a large language model developed by OpenAI.

ChatGPT is a powerful tool for natural language processing (NLP) that can understand and generate human-like text. This makes it an ideal candidate for use in eDiscovery, as it can quickly and accurately analyze large amounts of ESI in order to identify relevant information. For example, ChatGPT can be used to identify specific keywords or phrases within a document, classify documents by type, or even summarize the content of a document.

The introductory paragraphs above were generated by ChatGPT in response to a request to write a blog post on ChatGPT and eDiscovery. This is an example of how ChatGPT can generate text in such a way that one cannot immediately tell whether it was written by a machine or human. This blog post will provide initial takes on what the potential ramifications ChatGPT and similar Artificial Intelligence (A.I.) tools can be for the work CJA panel attorneys and federal defenders do. It is not advocating any specific position regarding A.I. technology which has wide ranging and yet to be realized implications in many fields. The goal is to provide a general idea of how this new A.I. technology might impact our work.

What is ChatGPT?

The current version of ChatGPT, 3.5 was released in late 2022 (openai.com/blog/ChatGPT). It is an artificial intelligence tool built on a natural language processing model known as a Generative Pre-trained Transformer (‘GPT’) or ‘generative A.I.’ developed by OpenAI. ChatGPT is great for generating human-like text to help solve problems. This can include answers to questions, summaries or translations of large volumes of text, generating lines of code, or providing step-by-step, conversational instructions for a wide range of complex software applications.

ChatGPT is trained on a massive corpus of datasets including many publicly available domains on the internet including Google, the Wayback Machine, Github, WordPress, Wikipedia, and so forth.  However, it is not connected to the internet in real time and has limited knowledge of world and events after 2021. This means it can occasionally produce inaccurate information, a problem that OpenAI acknowledges help.openai.com/en/articles/6783457-chatgpt-general-faq. In some instances, it will tell you it doesn’t know, sometimes it will provide an answer with a disclaimer. It can also provide an authoritative sounding answer that is wrong without any qualifier. It has even been known to fill in the gap with made up information. For example, eDiscovery expert Ralph Losey asked the robot to identify the top five eDiscovery cases for 2022. Since it did not have any 2022 cases to reference – it ignored the date – listed only 2021 cases, and even made up the name of a judge! ediscoverytoday.com/2023/01/02/ai-top-cases-of-2022-doesnt-include-any-cases-from-2022-artificial-intelligence-trends/

In response to these sorts of user experiences, OpenAI recently sent out a tweet with warnings noting that ChatGPT is useful for general information in subject areas such as language, science, engineering, finance, history, culture; and less suitable for high context or niche areas such as legal advice, and real time events. twitter.com/openaicommunity.

Can ChatGPT be used for discovery review?

Artificial Intelligence models based natural language processing have been deployed extensively in eDiscovery for some time. Foremost among these approaches is Technology Assisted Review (TAR)[1] which uses algorithms to identify and highlight relevant information based on input from subject matter experts. This technique helps reduce attorney review time and thereby creating time and cost and workflow efficiencies.

Since TAR and generative A.I. are both based on the natural language processing branch of artificial intelligence (Figure 1), one might assume that ChatGPT’s ability to generate human-like information about a broad and complex range of data sets could be easily applied to eDiscovery to enhance eDiscovery review methods such as TAR. Indeed, in the second introductory paragraph above, ChatGPT generated text that describes common eDiscovery tasks that artificial intelligence software can perform with the proper conditions. But it also wrote that it, ChatGPT, could do these types of tasks. While it is true that ChatGPT can perform these tasks based on information it has been trained on, it was not designed to perform eDiscovery tasks, and OpenAI has not developed a version of the GPT technology that can be utilized for eDiscovery. Furthermore, even if the underlying GPT-3.5 model could be developed for an eDiscovery environment, the immense computing resources it currently requires, designed for vast amounts of data, would make it non-scalable and cost-prohibitive. law.com/legaltechnews/2023/01/25/what-will-eDiscovery-lawyers-do-after-chatgpt/


Figure 1.

What can ChatGPT do right now?

ChatGPT has more direct application in terms of workflow and analysis. Discovery in criminal cases increasingly includes both structured (databases, spreadsheets) and unstructured (documents, videos, audio files, phone extractions, social media, emails) data. Currently, most workflows designed to integrate and synthesize these heterogenous formats are necessarily cumbersome, requiring a patchwork of approaches. Many easily available open source tools (e.g. Openrefine, referenced below) or applications such as Microsoft Excel which can be helpful to practitioners are under-utilized, if leveraged at all. ChatGPT has the potential to help bridge the gap between the utility of these applications and practitioners’ ability utilize them.

For example, below (Figure 2) is a screenshot showing ChatGPT’s response to a question about importing a CSV file[2] into CaseMap (a fact and case organization and analysis tool – nlsblog.org/2011/10/05/cja-panel-attorney-software-discounts). Note that while ChatGPT is providing helpful feedback, it is not providing specific, practical instructions on how to carry out the importation of the CSV file into a CaseMap database. This is due to the limited information about CaseMap built into the OpenAI model. In the example above, ChatGPT was able to provide a step-by-step guide on how to import a CSV file into CaseMap. However, there are better and more efficient ways to import a CSV file into CaseMap than what ChatGPT prescribed.

Figure 2.

In our second example, (Figure 3) we see how ChatGPT can help us deal with, CSV files containing ‘messy’ data, in this case duplicate rows in a spreadsheet. It provided guidance on how to utilize a tool called Openrefine openrefine.org to ‘clean-up’ the spreadsheet.

Figure 3.

Since Openrefine is a free, open source tool, ChatGPT was able to develop more accurate information than one might expect when dealing with ‘closed’, proprietary tools such CaseMap.

Conclusion

The need to harness software to effectively work our cases will only increase as data complexity continues to ratchet up. ChatGPT can help facilitate the utilization and adoption of open source and business applications in response to these challenges; lowering the bar to access by providing on-demand, human-like support to practitioners. This can help with the ‘trees’ we believe are relevant to our cases; e.g. a subset of files responsive to a search query. This still leaves the ‘forest’; the large tranches of discovery which we load into review platforms such as Eclipse SE and Casepoint, to parse and organize the data. Whether or how the generative AI technology underlying ChatGPT will have impact in this latter arena remains to be seen.


[1] Also known as predictive coding, computer assisted review, or supervised machine learning.

[2] A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record, and usually consists of tabular data from a database. The CSV file format is supported by a wide variety of business applications including MS Excel en.wikipedia.org/wiki/Comma-separated_values

Working with Email in Discovery: Processing Options and Review Workflows

Introduction

Technologies that allow for easier review of ediscovery in native format have become more affordable and accessible. Working with files in native format has several advantages including avoiding loss of potentially relevant information, access to metadata and better searchability. Email is one of the most common of the native formats produced in discovery. This article will explore some approaches for processing email and identify a number of low-cost of tools that can assist. (This article deals with the processing but not the substantive review of emails for case analysis – for this you should consider other tools such as CaseMap, or – for larger collections of emails – review platforms such as Casepoint or IPRO.)

The tools and approaches you select will depend on a combination of three factors: (1) volume, (2) format(s) and (3) the defense team goals. While a single tool might facilitate a discreet goal, more involved goals may require different approaches with a combination of tools. These scenarios can be ends in themselves or phases in an overall workflow. This article does not try to anticipate every possible situation that might arise but will explore a few common scenarios.

Many electronic file formats produced in the course discovery like Acrobat, Excel and Word files are generally accessible via standard software available on most computers. However, email file formats like MSG, EML, PST, and MBOX files present more of a challenge as often the recipient may not know how to access these files. 

Below is a quick overview of some of the most common email file formats encountered in eDiscovery that will be discussed in this article:

  • MSG: A Microsoft format for single emails. Often associated with the Microsoft Outlook email client.
  • PST: A Microsoft format for a collection of emails (as well as other potential items including: Calendars, Contacts, Notes and Tasks). Often associated with the Microsoft Outlook email client.
  • EML: Email format for single emails used by many email clients including Novell GroupWise, Lotus notes, Windows Mail, Mozilla Thunderbird, and Postbox.
  • MBOX: Email format for a collection of emails (as well as other potential items including: Calendars, Contacts, Notes and Tasks) used by many email clients including Novell GroupWise, Lotus Notes, Windows Mail, Mozilla Thunderbird, and Postbox.

All four formats are typically received in discovery and subpoena returns. Google Takeout, a service offered by Google which allows you to download your email, will produce emails in the MBOX format.

Working with these email formats consists of understanding which tool is compatible with which file format, and which tool or set of tools will most effectively allow you to achieve your goals. Below is a table that maps out some of various tools available in terms of which file formats they are able to process, their functionality and cost. Before using any of these tools, make sure to work with a copy of the data as opposed to the original.

SoftwareCompatible FormatsCostFunctionality
Mozilla Thunderbird with the Import Export Tools add-onEML, MBOXfreeView emails, convert to EML, HTML, MBOX and PDF (without attachments)
Mbox ViewerEML, MBOXfreeView emails, convert to HTML or PDF (without attachments)
PSTViewer ProMSG, EML, PST, MBOX$129View emails, convert to multiple formats including EML, HTML, MBOX and PDF (includes advanced PDF attachment image options)
MS Outlook  MSG, EML, PST$159 or $69.99 per yearView emails, export to MSG, PST and PDF (requires Acrobat integration)
Aid4MailMSG, EML, PST, MBOX$299 per yearConvert email to multiple formats including MSG, HTML, EML, PST, MBOX and PDF
dtsearchMSG, EML, PST, MBOX$199 or *free
Search and view results in email viewer panel (no conversion or export options)

*For information about a free license of dtSearch available to CJA Panel Attorneys see: nlsblog.org/2014/03/25/dtsearch-desktop

This article will discuss demonstrate how to work with emails in terms of a series of discreet tasks including:

  1. Generate a list of emails to review.
  2. Viewing emails.
  3. Search, tag, and convert emails.
  4. Working with email attachments.

1. Generating a list of emails for review.
An initial task at the outset of a case might be to generate an index to facilitate early case assessment. Some programs, like PstViewer Pro, will work with many formats while other programs, like Mbox Viewer, work with a more limited number of formats.

  • Example 1 – Generating a list using Mbox Viewer:
    Mbox Viewer is a free tool that allows you to preview emails and generate a list of emails by simply selecting messages in the viewer, doing a right click and selecting print to CSV, then selecting which fields you would like to include in the spreadsheet (Figure 1-1).
Figure 1-1
  • The resulting CSV file contains a table that can be opened in Excel or imported into other programs (Figure 1-2).
Figure 1-2

2. Viewing emails.
While a list will provide you with a high-level overview of the emails you have in terms of subject matter, players involved and so forth, a closer review will require a different approach. MS Outlook, Mbox Viewer and Mozilla Thunderbird are all tools which can be utilized for this purpose.

  • Example 2.1 – Viewing emails received in PST format using MS Outlook:
    Within Outlook open the ‘File’ menu, select the ‘Open & Export’ button, then ‘Open Outlook Data File’. Navigate to the folder containing the PST file (Figure 2-1) and select the file to import. Outlook will create a folder within the ‘Personal folders’ from where you can conduct a review of the files.
Figure 2-1
  • Example 2.2 – Viewing emails received in MBOX format using Mozilla Thunderbird with the Import Export Tools add-on:
    The free ‘Import Export Tools’ add-on available for Mozilla Thunderbird allows for the import and viewing of MBOX files. After the add-on has been installed, right click on ‘local folders’, then choose ‘Import mbox file’ from the ‘ImportExportTools NG’ menu and navigate to the folder containing the MBOX file (Figure 2-2). This will copy the MBOX file into Thunderbird’s ‘Local Folders’ where, similar to Outlook, you can conduct a review of the emails within.
Figure 2-2

3. Search, tag, and convert emails
The approaches discussed in the two previous sections can be useful when you simply want to gain a high-level view of the emails, or take a closer look at particular emails in a smaller collection. However, when you are working with large volumes of emails, manual review becomes impractical and inefficient, and taking advantage of the search and tag functionality of the available tools is a better approach.

  • Example 3 – Searching, tagging and exporting within MS Outlook:
    Outlook can be utilized to conduct key word searches, and relevant files can be tagged exported as either MSG or PDF files (using the Acrobat integration that is included with licensed copies of Acrobat Standard and Pro). To tag an email, right click and select ‘Categories’ then select a color coded tag (Figure 3-1). You can also customize the tags using the ‘New Category’ option within the ‘Category’ dialog box (Figure 3-2).
Figure 3-1
Figure 3-2
  • You can then filter and tag a selection of emails (Figure 3-3) and save them to a folder as either individual MSG files or a new PST file. If you have a licensed version of Adobe Acrobat, there integration menu within Outlook can be used to convert messages into individual PDF’s or a combined ‘PDF Portfolio’ (Figure 3-4).
Figure 3-3
Figure 3-4
  • When choosing an export format, be aware of the limitations of the different conversion formats. The HTML and PDF export formats typically will not include the complete email metadata. Email header information that may include important information like IP addresses used may be lost during conversion. Export formats including the MSG, EML, MBOX and PST retain much more of the original email metadata.

4. Working with email attachments.
Emails invariably have attachments, which, in addition to the body of the email can contain substantive relevant information. The programs discussed in this post vary greatly with how attachments are handled during format conversion. Be aware that some of the programs are not able to include the attachments when exporting to PDF. While PDFs are generally easier to add bates stamps to or turn into exhibits not all programs include the attachments..

  • Example 4.1 – Exporting email with attachments using Mozilla Thunderbird with the Import Export Tools add-on:
    Thunderbird offers several export options including the ability to batch export relevant emails when using the Import Export Tools add-on. It does not have the ability to embed or append attachments when exporting messages to PDF, however it does allow for emails to be exported to the EML format (with attachments embedded) as well as an HTML format, which will include links to exported copies of the attachments (Figure 4-1).
Figure 4-1
  • Example 4.2 – Exporting email with attachments using PSTViewer Pro:
    PSTViewer Pro is yet another option for format conversion, and is a great tool to use in conjunction with tools like Thunderbird or Outlook. It can convert to many formats and includes some advanced PDF conversion options. When converting to PDF, attachments can either be embedded or “imaged” (Figure 4-2). The “imaged” option will convert supported attachments into PDF pages and appended them to the PDF version of the email (Figure 4-3).
Figure 4-2
Figure 4-3

Conclusion

As shown in this article there are a multiplicity of tools available to work with emails that are not universally compatible with all email formats and do not have the same functionality. This requires careful thought about how to leverage and integrate the tools. The best path forward through this thicket is to know what your goals are before you select your tool. Defining your goal early will help you select which tool or combination of tools you should use to develop an effective workflow that matches both the set of data you are working with and the needs of your case.