Inside The Black Box: Excluding Evidence Generated by Algorithms

[Editor’s Note: John C. Ellis, Jr. is a National Coordinating Discovery Attorney for the Administrative Office of the U.S. Courts, Defender Services Office. In this capacity, he provides litigation support and e-discovery assistance on complex criminal cases to defense teams around the country. Before entering private practice, Mr. Ellis spent 13 years as a trial attorney and supervisory attorney with Federal Defenders of San Diego, Inc. He also serves as a digital forensic consultant and expert.]

Introduction:

For many years, law enforcement officers have used records generated by mobile carriers to place a mobile device in a general area. The records are called Call Detail Records (“CDRs”). CDRs are generated when a mobile device sends or receives calls and text messages. Mobile carriers likewise keep records of when data is used, such as browsing the internet. These records are called Usage Detail Records (“UDRs”). At times, the records generated by mobile carriers include the location of the cell site or cell sites and the direction of antenna that connected with the mobile device.

Cell Site Location Information (“CSLI”) is the practice of creating maps showing the possible coverage area of a cell site at the time a device was being used. For these purposes, it is important to keep in mind that the records only show the location of the cell site and the direction the antenna is facing. Recent technological improvements have resulted in mobile carriers now generating Enhanced Location Records (“ELRs”), which purport to show more precise location data. In AT&T parlance, such records are based on the Network Event Location System (“NELOS”). This location data is derived from proprietary algorithms.

In a recent federal case, the government, through a member of the Federal Bureau of Investigation’s (“FBI”) Cellular Analysis Survey Team (“CAST”), sought to introduce NELOS records in a trial. However, after a Daubert hearing where the CAST agent testified, the district court excluded the records, in part, because of concerns over the reliability of the algorithms used to determine the location data.

This article provides an overview of CSLI and NELOS records, discusses the order excluding NELOS records from trial, and provides practical advice for practitioners.

Overview:

When CDRs include cell site location data, analysts and law enforcement officers use these records to show the location of the cell site and the orientation of the sector. In North America, many cell towers contain three sets of antennas, with each set offering specific coverage area.

Picture 1

To illustrate this point, Picture 1 is an overview picture of a multi-directional cell tower. Each blue arm is a sector. When a mobile device connects to a cell site, the mobile carrier often records the activity (i.e., a sent text message), the time of the activity, and the location of the cell site and sector that was used.

Using these three data points, analysists and law enforcement officers create maps showing the location of the cell site and the orientation of the sector. In Map 1, the arms are used to demonstrate the beamwidth of the sector, which in this case records indicate is 120-degrees. The cone at the base of the triangle is only meant to show the orientation of the sector, not coverage area. Moreover, analysts generally will not testify that the mobile device was within the triangle. The triangle is only meant to represent the location of the cell site and the orientation of the sector.

Map 1

With NELOS records, on the other hand, the ELRs purport to show the location of a device as opposed to the location of the cell site. In the following example, the red pin represents the location of the device. The blue circle represents what AT&T calls the “Location Accuracy.” This accuracy ranges from approximately several meters to 10,000 meters. And some records are marked by “location accuracy unknown.” As discussed below, the Location Accuracy is determined by proprietary algorithms used by AT&T.

Map 2

In Map 2, the ELR indicates that the “[l]ocation accuracy [is] likely better than 300 meters.” In other words, the phone was at the red pin or within the blue circle at a specific date and time. NELOS records, however, contain the following statement: “The results provided are AT&T’s best estimate of the location of the target phone. Please exercise caution in using these records for investigative purposes, as location data is sourced from various databases, which may cause the location results to be less than exact.” DE 156 at 23 (emphasis added).

To put the first two examples into perspective, Map 3 shows both traditional CSLI and the use of NELOS records.

Map 3

The NELOS demonstrative, even taking account of the “Location Accuracy,” still provides a much smaller, and thus more specific, area of where the phone activity took place.

United States v. Smith, et al. (4:19-CR-514-DPM) (EDAR):

Donald Smith and Samuel Sherman were charged in a five-count indictment with various crimes relating to a murder. See Docket Entry (“DE”) 1. The government sought to introduce the testimony of CAST Agent Mark Sedwick “that provider-based location data typically is collected by obtaining historical call detail records for a particular cellular telephone from the service provider, along with a listing of the cell tower locations for that service provider.” DE 102 at 1. According to the government, “[t]his data is then analyzed for the purpose of generally placing a cellular telephone at or near an approximate location or locations on a map at points in time.” Id.

The government sought to have Agent Sedwick testify “regarding the activity and approximate locations of the cellular telephones believed to have been utilized by Donald Bill Smith, Samuel Sherman, Racheal Cooper and Susan Cooper on the approximate dates and times relevant to the charges in the Indictment.” Id. at 1-2. Attached to the government’s motion is the report created by Agent Sedwick. Maps 4 and 5 are examples from Agent Sedwick’s report. Map 4 shows how Agent Sedwick mapped traditional CSLI, and Map 5 shows how he mapped the same time period using NELOS records:

 

Map 4
Map 5

Map 4 shows traditional CSLI mapping with the location of the cell site and the orientation of the sector. With Map 5, each circle represents the area in which the device was used. Here, there are four such events. For comparison, in Map 4, Agent Sedwick’s opinion is limited to testifying about the location of the cell site and the orientation of the sector, whereas with Map 5, the testimony is the mobile device is within the circle.

Prior to trial, defense counsel challenged Agent Sedwick’s potential testimony and the district court conducted a hearing to determine the admissibility of the records pursuant to Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 US 579 (1993). During the hearing, Agent Sedwick explained the reason AT&T created NELOS was to “test the health of the 3G network for planning and troubleshooting. It is a passive system where, while the phone is on the control channel communicating with the network across the control channel, it would passively pull whatever location data it could pull or data to compute location from that device.” DE 156 at 8.

Agent Sedwick further explained: “NELOS also became the generic term for any kind of location data. So depending, there might be other databases that were also pulled into the NELOS report that we receive from AT&T. Just from that report there’s no way to determine what other databases that was pulled from.” DE 156 at 9.

Agent Sedwick also provided information about known issues with NELOS data, specifically based on Temporary Mobile Subscriber Identity (“TMSI”). By way of background, mobile devices are assigned an International Mobile Subscriber Identity (“IMSI”), a unique number used by mobile carriers, which establishes that the mobile device can operate on a specific network. This is the number used by mobile carriers when creating CDRs. At times, however, in order to mask a device’s actual IMSI, networks assign the device a TMSI.[1] This is problematic for NELOS records because as Agent Sedwick explained, “[t]hat TMSI sometimes can get reallocated and then allocated back to a device, so you can have sometimes where the NELOS data will pull from a different device and get put into the records for the device that you’re requesting.” DE 156 at 10.

During cross-examination, Agent Sedwick was questioned about the portion of NELOS records that “caution in using these records for investigative purposes.” Agent Sedwick responded: “I wouldn’t rely on it if all I had was a NELOS point putting someone at a scene and that’s all I had, no, I would not use it. I’m using it—there is a caution with it, but I’m using it in the context of I have call and text to support it, I have other data to support, I have very good precise NELOS data. I feel very, very confident that this is accurate.” DE 156 at 24.

Agent Sedwick’s confidence in the accuracy of NELOS records was based on the proprietary algorithms created by the phone company. See DE 156 at 12 (“Question: Okay. So the device is sending various different events, they’re plugged into that algorithm, and essentially the algorithm will spit out what it computes as accuracy; is that correct? Answer: Yes, ma’am”). But Agent Sedwick acknowledged that he was not privy to the algorithm, nor whether NELOS was tested by AT&T for reliability. Instead, Agent Sewick testified he believed the algorithms are reliable “[b]ecause AT&T relies on that to make multi-million-dollar decisions on how they’re going to design their network.” DE 156 at 32.

In granting the defense’s motion to exclude NELOS data, the district court found:

What particularly concerns me, though, is this mystery algorithm that our—and the proprietary software. We don’t know, I don’t know exactly what is in the algorithm, and the agent gave some testimony at a general level about the kind of information that goes in, but it seems to me that I’m missing a—an important foundational stone there of something with more specificity as to the kinds of things that the algorithm uses and how the algorithm does its work.

We know that there are disturbances from time to time, or anomalies as was called, with the TMSI number. I also—I acknowledge some uncertainty about TMSI numbers and how many devices that might be connected with and how it is that the algorithm might deal with that. So there’s that. Then there is, in my view, almost a—so we’ve got our black box there, which is concerning, and I would say at this point there’s a peer review problem, as well, because I don’t have any scholarly literature or evaluation of the black boxes or the kind of things that could go into this black box and how it would work.

I understand about the corroboration, but I still find myself at sea of understanding how it is the—how things happen in the black box and whether—whether what comes out of the black box is sufficiently reliable that the jury can rely on it.

DE 156 at 85-87 (emphasis added).

Based on this, the district court entered the following order: “Agent Sedwick may testify about call detail records and historical cell-site analysis; but he may not testify about NELOS data and analysis.” DE 154.

Further Consideration:

The district court’s exclusion of NELOS records was based, in part, on the use of data generated by untested algorithms. Other mobile carriers also use ELRs, which generate purported location data that are also based on proprietary algorithms similar to NELOS. In seeking to exclude ELRs, as well as other forms of computer-generated data, counsel should encourage courts to question the reliability of evidence created by algorithms that lack independent validation and verification.

Glossary:

Acronym Full Title
CASTCellular Analysis Survey Team
CDRCall Detail Records
CSLICell Site Location Information
ELREnhanced Location Records
IMSIInternational Mobile Subscriber Identity
NELOSNetwork Event Location System
TMSITemporary Mobile Subscriber Identity
UDRUsage Detail Records

[1] As explained by EFF, “upon first connecting to a network, the network will ask for your IMSI to identify you, and then will assign you a TMSI … to use while on their network. The purpose of the pseudonymous TMSI is to try and make it difficult for anyone eavesdropping on the network to associate data sent over the network with your phone.” See https://www.eff.org/wp/gotta-catch-em-all-understanding-how-imsi-catchers-exploit-cell-networks.

Google Data and Geofence Warrant Process

[Editor’s Note: John C. Ellis, Jr. is a National Coordinating Discovery Attorney for the Administrative Office of the U.S. Courts, Defender Services Office. In this capacity, he provides litigation support and e-discovery assistance on complex criminal cases to defense teams around the country. Before entering private practice, Mr. Ellis spent 13 years as a trial attorney and supervisory attorney with Federal Defenders of San Diego, Inc. He also serves as a digital forensic consultant and expert.]

Introduction

We all know that Google is tracking us. But what does that actually mean? What exact data are they “tracking,” how are they doing it, and for those of us who are representing clients in federal court, how is law enforcement getting that data from Google and using it in their prosecutions?

This blog post will try to give you some answers to these questions. The purpose of this post is threefold: first, to provide a primer on how Google collects location data; second, to explain the three-step warrant process used by law enforcement to obtain these records; and third, to give an example of how the data is collected and used by law enforcement. Note this guidance is based on publicly available information, including recent court opinions. To date, there has not been an opportunity for defense attorneys to seek discovery from Google or to question a qualified representative from Google about their methods of collecting location data. 

What Can Google Do?

Google began collecting location data in order to provide location-based advertisements to its users. Location data is tracked by Google from users, including from consumers who use Android telephones and those who use Google’s vast array of available apps on other devices, including Apple iPhones. For Android devices, Google is constantly tracking devices whenever the permission settings on the device are set to allow for the use of Google Location Accuracy. For iOS users, location information is only collected when a user is using a Google product, such as Google Maps.[i]

Google can determine the approximate location of a device based on GPS chips in the device, as well as the device’s proximity to Wi-Fi hotspots, Bluetooth beacons, and cell sites.[ii] For Wi-Fi and Bluetooth, Google already knows the location of hotspots and Bluetooth beacons. When a device detects an available Wi-Fi network, for instance, it records and sends the unique serial number to Google.  Since Google has previously connected the physical location of many such hotspots with the unique identifier, Google assumes that if you are in range of a Wi-Fi hotspot, you should be sent advertisements for businesses in that area.

How Google tracks this data depends on the type of device (Android v. Apple) and an individual user’s privacy settings.[iii] Google cannot determine the exact location of a device, and as such, location records contain an “uncertainty value” which is expressed in meters. This service, called Sensorvault, was designed by Google to sell location-based advertisements.

Maps Display Radius

Although Google does not know a device’s precise location, it often has an idea where the device is located, which is represented by one or more spheres, or what Google calls the Maps Display Radius.

For example, in this picture, the dark blue circle in the middle is Google’s best guess about the actual location of a device. According to Google, its “goal is that there will be an estimated 68% chance that the user is actually within” spherical representation.[iv] 

But Google is not always sure the user is actually in the small blue circle; the area indicated by a larger sphere, outlined in white in this example, represents Google’s guess as to where the user may actually be. 

This makes sense considering the goal of Sensorvault is to provide location-based advertisements.  For this purpose, if a user is within several blocks of a location, the location-based advertisement succeeds.  This becomes relevant because the government claims it is the same procedure used in producing location data to law enforcement.[v]

It is useful to see how Google determines the approximate location of a device by looking at the Location History of a Google account. In this example, according to Google, the blue line indicates the path of travel; the orange dots represent the source of the location data; and the grey sphere next to the blue arrow is the estimated range of the location source. Google determines the line of travel based on the proximity to the sources of location data.

Generally, the location information source has the biggest impact on the Maps Display Radius. Among GPS chips in phones, Bluetooth beacons, Wi-Fi hotspots, and Cell Sites, GPS provides the smallest sphere whereas Cell Sites are generally the largest. In other words, GPS location is generally the most accurate of the major location information sources, and Cell Sites are the least accurate. For example, the map display radius for GPS is often only a few meters, while locations based on cell sites routinely have radiuses of over 1000 meters.

Use of Google’s Tools by Law Enforcement – Three-Step Warrant Process

Although the original intent of Google’s Sensorvault technology was to sell location-based advertising more effectively, over the past few years this data has been sought by law enforcement to determine who was present in a specific geographical area at a particular time, such as when a crime has been committed. These warrants are often called “geofence warrants” because officers seek information regarding devices which were contained with a geographic area at a certain time.

Google currently requires law enforcement to obtain three separate warrants to access the information.[vi] The first two warrants seek an anonymized list of devices within specific coordinates at specific times. The specific locations are defined as a radius or a polygon. The third warrant provides information about the owner of the accounts associated with a specific device.

First and Second StepsExample

In response to the first warrant, Google provides the following data: (1) anonymized user identifiers; (2) date and time the device was in the geofence; (3) approximate latitude and longitude of the device; (4) what Google deems its map display radius; and (5) the source of the location data. The warrant returns warn that the Maps Display Radius field reflects an estimated uncertainty value regarding the reported coordinates with the range depending on numerous factors and that the location approximation is intended for the product’s use.[vii]

As for the second step, after reviewing responses to the first warrant, “[i]f additional deidentified location information for a device in the production is necessary to eliminate false positives or otherwise determine whether that device is actually relevant to the investigation, law enforcement can compel Google to provide additional contextual location coordinates beyond the time and geographic scope of the original request….”[viii] 

For example, In the Matter of the Search of information that is stored at premises controlled by Google, 1600 Amphitheatre Parkway, Mountain View, California 94043, 18MJ191-DEJ (EDWI 2018), law enforcement officers investigating a bank robbery sought information about “all Google accounts” located within a 30 meters radius around 43.110877, -88.337330 on October 13, 2018 from 8:50 a.m. to 9:20 a.m. CST.  The red radius in the following example shows boundaries of the geofence warrant.

Another example is In the Matter of the Search of Information Regarding Accounts Associated with Certain Location and Date Information, Maintained on Computer Servers Controlled by Google, Inc., 18MJ169-ML (WDTX 2018).Law enforcement officers investigating a series of bombings sought location information for “all Google accounts” for a 12-hour period between March 1 and 2, 2018 in a “[g]eographical box” around 1112 Haverford Drive, Austin, Texas, 78753 containing the following coordinates: (1) 30.405511, -97.650988; (2) 30.407107, -97.649445; (3) 30.405590, -97.646322; and (4) 30.404329, -97.647983.  The boundaries of the geofence in the following picture are highlighted in blue.

Third Step

The third step involves compelling Google “to provide account-identifying information for the device numbers in the production that the government determines are relevant to the investigation. In response, Google provides account subscriber information such as the email address associated with the account and the name entered by the user on the account.”[ix]

Starting from the Beginning – How the Process Works

For example, a crime occurs in the parking lot of a strip mall.

Because the crime happens in the middle of a parking lot, law enforcement would create a geofence, which would include storefronts since that would increase the chances a suspect’s device would interact with a Wi-Fi hotspot or Bluetooth beacon; it also means many more people unconnected to the offense would have their information captured.

Although the above geofence appears to impact only people who are present in the parking lot or surrounding businesses, it would likely capture the personal data of people living in the nearby apartments and those driving on the surrounding streets.  The list of deice identifiers and location points for such a geofence warrant would likely be extensive; the following is an example of a warrant return, with a more limited dataset:

Device IDDateTimeLatitudeLongitudeSourceMaps Display Radius (m)
12345678912/20/2015:08:45(-8:00)32.752667-117.2168GPS5
98765432112/20/2015:08:55(-8:00)32.751569-117.216647Wi-Fi25
14785236912/20/2015:08:58(-8:00)32.752022-117.216369Cell1000
12345678912/20/2015:09:47(-8:00)32.752025-117.216369Cell800
98765432112/20/2015:09:55(-8:00)32.752023-117.216379Wi-Fi15
12345678912/20/2015:10:03(-8:00)32.752067-117.216368Wi-Fi25
98765432112/20/2015:10:45(-8:00)32.752020-117.216359Cell450
98765432112/20/2015:10:55(-8:00)32.752032117.216349Wi-Fi40
12345678912/20/2015:10:58(-8:00)32.752012117.216379Cell300

For Stage One and Two returns, the Device ID field contains an anonymized user identification number.  In a stage three warrant, law enforcement officers seek to user’s actual name.  The Date and Time fields reflect when a device was within the geofence.  The Latitude and Longitude fields reflect the coordinates of a device within the geofence.  The Source field indicates if the location data is based on GPS, Wi-Fi, or Cell.[x] Finally, the Maps Display Radius (m) field reflects the uncertainty of the location data represented in a sphere measured in meters.

In this example, Device ID 123456789 is Suspect One, Device ID 987654321 is Suspect Two, and Device ID 147852369 is Suspect Three.  For this example, only one location for each device is shown.

At first blush, it would appear as if the Geofence has located three possible suspects.  But this image does not tell the full story. The blue bubbles for Suspect One and Suspect Two show a Maps Display Radius of 5 and 25 meters respectfully.

Suspect Three’s location was derived from a Cell Site, with a Maps Display Radius of 1000 meters.

Thus, although Google believes that Suspect Three’s device was near the scene of the crime, it is possible it was located anywhere within the larger sphere, and maybe the device was not within either sphere.

Conclusion

As technology and privacy concerns of consumers continue to evolve, so will the ability of law enforcement to obtain location data of users. Using Google geofence warrants implicates several Fourth Amendment issues; future posts will explore the legal implications surrounding the overbreadth of these warrants.[xi] But beyond the legal challenges, those encountering Google location warrants should remain mindful of the limitations of the data and the absence of concrete answers from Google regarding their methodology for determining location data.


[i] The exception is for a user who has turned location services to always on, has a Google product open on a device, and has allowed for background app refresh. That means that is likely that Google knows far more about the location history of android users than iPhone users. That’s important because approximately 52 percent of devices on mobile networks are iOS devices. https://www.statista.com/statistics/266572/market-share-held-by-smartphone-platforms-in-the-united-states/.

[ii] https://policies.google.com/technologies/location-data (“On most Android devices, Google, as the network location provider, provides a location service called Google Location Services (GLS), known in Android 9 and above as Google Location Accuracy. This service aims to provide a more accurate device location and generally improve location accuracy. Most mobile phones are equipped with GPS, which uses signals from satellites to determine a device’s location – however, with Google Location Services, additional information from nearby Wi-Fi, mobile networks, and device sensors can be collected to determine your device’s location. It does this by periodically collecting location data from your device and using it in an anonymous way to improve location accuracy.”)

[iii] https://support.google.com/nexus/answer/3467281?hl=en

[iv] See United States v. Chartrie, 19cr00130-MHL (EDVA 2020), ECF 1009 [Declaration of Marlo McGriff] (“A value of 100 meters, for example, reflects Google’s estimation that the user is likely located within a 100-meter radius of the saved coordinates based on a goal to generate a location radius that accurately captures roughly 68% of users. In other words, if a user opens Google Maps and looks at the blue dot indicating Google’s estimate of his or her location, Google’s goal is that there will be an estimated 68% chance that the user is actually within the shaded circle surrounding that blue dot.”)

[v] See Id. at 10 (“[I]f a user’s estimated location (i.e., the stored coordinates in LH) falls within the radius of the geofence request, then Google treats that user as falling within the scope of the request, even if the shaded circle defined by the 68% confidence interval falls partly outside the radius of the geofence request. As a result, it is possible that when Google is compelled to return data in response to a geofence request, some of the users whose locations are estimated to be within the radius described in the warrant (and whose data is therefore included in a data production) were in fact located outside the radius. To provide information about that, Google includes in the production to the government a radius (expressed as a value in meters) around a user’s estimated location that shows the range of location points around the stored LH coordinates that are believed to contain, with 68% probability, the user’s actual location.

[vi] Over the years, this practice has changed.  At one point, law enforcement only submitted one warrant requesting the three-step process.  In more recent cases, it appears as if Google requires a separate warrant. 

[vii] Id. at 4 (“After that search is completed, LIS assembles the stored LH records responsive to the request without any account-identifying information. This deidentified ‘production version’ of the data includes a device number, the latitude/longitude coordinates and timestamp of the stored LH information, the map’s display radius, and the source of the stored LH information (that is, whether the location was generated via Wi-Fi, GPS, or a cell tower)”).

[viii] Id. at 17

[ix] Id.

[x] Google has the unique identifier for Wi-Fi hotspots and Cell sites.  If this information was included in warrant returns, it would assist in verifying that the location information provided in the returns is accurate.

[xi] In the Matter of the Search of: Information Stored at Premises Controlled by Google, 20mc00392-GAF (NDIL 2020) provides a great overview of the Fourth Amendment issues relating to Google Geofence warrants.  See also https://www.eff.org/deeplinks/2020/07/eff-files-amicus-brief-arguing-geofence-warrants-violate-fourth-amendment

E-Discovery: Mobile Forensic Reports

By Sean Broderick and John C. Ellis, Jr.

[Editor’s Note: Sean Broderick is the National Litigation Support Administrator.  He provides guidance and recommendations to federal courts, federal defender organization staff, and court appointed attorneys on electronic discovery and complex cases, particularly in the areas of evidence organization, document management and trial presentation. Sean is also the co-chair of the Joint Working Group on Electronic Technology in the Criminal Justice System (JETWG), a joint Department of Justice and Administrative Office of the U.S. Courts national working group which examines the use of electronic technology in the federal criminal justice system and suggested practices for the efficient and cost-effective management of post-indictment electronic discovery. 

John C. Ellis, Jr. is a National Coordinating Discovery Attorney for the Administrative Office of the U.S. Courts, Defender Services Office. In this capacity, he provides litigation support and e-discovery assistance on complex criminal cases to defense teams around the country. Before entering private practice, Mr. Ellis spent 13 years as a trial attorney and supervisory attorney with Federal Defenders of San Diego, Inc. He also serves as a digital forensic consultant and expert.]

Most federal criminal cases involve discovery that originally came from a cell phone. CJA panel attorneys and Federal Defenders have now become accustomed to receiving “reports” generated from Cellebrite.[1] In this blog post, we will talk about the valuable information that may be contained in those Cellebrite generated reports and what form of production you can get the reports in. Spoiler alert: we suggest you request that you receive those reports in Cellebrite Reader format and not just default to the PDF format that you know and love.

We are going to cover:

  1. the basic concepts behind the forensic process that law enforcement uses when using Cellebrite UFED to extract information from a phone,
  2. what is a Cellebrite generated mobile forensic report (which Cellebrite calls extraction reports), and
  3. the pros and cons for the potential formats you can receive Cellebrite generated reports in.

Though there are a number of forensic tools that law enforcement may use to extract data from a phone, the most common is Cellebrite. We are going to discuss Cellebrite, but know there are others (e.g. Oxygen, Paraben, etc.). Many of the processes and principles that apply to Cellebrite will apply to other tools.

Basic concepts behind the forensic process

How does a digital forensic examiner get the data from the mobile phone? Extracting data from mobile devices (a.k.a. acquisition) is complex and requires a great amount of skill when done correctly. For purposes of this blog post, we are only going to focus on one concept, which is the type of extraction that was performed. In order to retrieve data from a mobile phone, an examiner attaches the mobile phone to a computer which has the Cellebrite UFED software, follows a series of protocols, and saves a portion of the data on an external storage device. In most cases, examiners will not retrieve all data that was on the mobile phone at the time of the extraction—this is based in part on the phone’s memory architecture. Moreover, the type of extraction that is performed on the device can limit the amount of data that is retrieved.

The following are the most common types of extractions for Android devices: (1) Logical (or Advanced Logical); (2) File System; and (3) Physical. As for Apple, the most common types are Logical (Partial) and Advanced Logical. Generally, physical extractions retrieve the most data. After the iPhone 4, physical extractions are currently no longer available with Cellebrite with an iPhone device.

After a digital forensic examiner does an extraction of a phone (for this example, we will assume that the extraction was done through the Cellebrite UFED4PC), it generates an extraction files/folders, along with a .UFD (text) file that tells Cellebrite Physical Analyzer basic information about the extraction (such as which UFED was used, start and finish time, and hash information). The extraction files can be produced in a number of formats (.zip and .bin are common examples) depending on the type of extraction done. The takeaway here is that the type of extraction impacts the type and volume of data that was retrieved during the extraction process.

What is a Cellebrite generated report?

After extracting the data, the examiner uses Cellebrite Physical Analyzer to review the data retrieved from the mobile phone. The examiner also has the option of generating a report, which allows users without specialized forensic software to view the data retrieved from the mobile phone. As discussed below, the “extraction report” may be produced in multiple formats. Of note, the examiner can apply filters to decide what data types to export (e.g. emails, images, instant messages, searched items, etc.), and can further filter the data by date range. These reports are limited to the data extracted from the original device; the parameters of the forensic program dictated by the forensic examiner. The takeaway here is that a report does not necessarily include all data that was retrieved during the extraction.

Option for the Cellebrite generated report (extraction report)

Cellebrite generated reports, like the extractions described above, contain information from the mobile phone. This may include text messages, emails, call logs, web browsing history, location data, etc. They can be produced in a number of formats, though the most common are .PDF, .HTML, and .UFDR. There are pros and cons for each format of report.

PDF

Report in PDF format

There are several pros to receiving a Cellebrite generated report in PDF. CJA panel attorneys and Federal Defender defense teams are used to working PDFs. It is easy to add Bates stamps to them. They work on Macs. And they can be annotated and highlighted.

But there are also several important cons that make PDF a less desirable file type for Cellebrite generated reports. For instance, because phones have the capacity to contain large volumes of data, the reports generated from extractions can be quite large. A Cellebrite generated PDF report can easily reach 10,000 pages, which can cause a computer to slow down or even crash. Moreover, users cannot sort or filter data, hide data fields, or search within search results. In short, although PDFs are a convenient file type, it is not the most useful or efficient format for reviewing these types of reports.

HTML

Report in HTML format

There are several pros to receiving a Cellebrite generated report in the HTML format. The files load fast and can be viewed in any browser (such as Chrome, Firefox or Safari). In this format, each data type, such as SMS Messages, are hyperlinked and open in a new browser. (Please note that the hyperlinks only work if the file and the data are provided with the HTML file which can easily get overlooked when people move data.) Moreover, it is easy to search within HTML files and they operate on Macs.

But like PDFs, HTML files have several notable cons. First, you cannot sort or filter the data. Nor can you hide data fields. And you cannot easily generate reports for other subsets of information. Although HTML files are easy to use, they have significant limitations when it comes to reviewing reports.

UFDR

Report in UFDR format

The best format for receiving Cellebrite generated reports is the Cellebrite Reader format. The Cellebrite Reader format allows a user to create reports containing all data, or a portion thereof, in multiple formats including PDF, HTML and UFDR. So, if you receive if in UFDR format you can easily convert it to PDF or HTML later on (which is not possible if you receive it in HTML or PDF). Additionally, in this file format, users can sort and filter data, can search within results, can move or reorder data within columns, and can create tags—which is a convenient way to organize large volumes of discovery. And a user can open multiple UFDR files at the time and search across them. This allows a user to, amongst other things, search for keywords across multiple devices simultaneously.

The one downside to UFDR files is that they will not work on a Mac. You also need to have the free Cellebrite Reader program to open and use the UFDR file. Overall, this is the format you should request when speaking to the government about what form you would like reports generated from Cellebrite produced in.

Final note about formats: When deciding about your preferred format to review a Cellebrite generated report, remember that it is easy for an examiner to select all three formats at the same time. Often, an examiner will provide all three to make it easier for people to review the data in the way they want.

Conclusion

Mobile forensic reports are a ubiquitous part of discovery. When reviewing them, it is important to remember that the information in the report is limited by the limitations of retrieving data from mobile devices, the type of extraction performed on the device, and the data the examiner decided to include in the report. And the form of production of the report can affect how you review the data. Attorneys should consider contacting an expert or consultant if they have questions about the contents of a report.

Of note, Troy Schnack, Computer System Administrator for Federal Public Defender Office in Kansas City, Missouri, will be doing a webinar on mobile devices and will go into detail regarding Cellebrite Reader on Tuesday, September 22, 2020. Please register for the program on fd.org – we highly recommend it.


[1] Cellebrite UFED is a mobile forensic software program that allows trained users to extract and analyze phone call history, contact information, audio, photos, and videos and texts from mobile phones or forensic images of mobile devices produced as part of discovery. It has wide coverage for accessing digital devices from Android to Apple, with more than 31,000 device profiles of the most common phones. Cellebrite UFED can come as software only or can include a physical unit with accessories such as tip and cable set to connect to various mobile devices.

 

Ephemeral Messaging Apps

[Editor’s Note: John C. Ellis, Jr. is a National Coordinating Discovery Attorney for the Administrative Office of the U.S. Courts, Defender Services Office. In this capacity, he provides litigation support and e-discovery assistance on complex criminal cases to defense teams around the country. Before entering private practice, Mr. Ellis spent 13 years as a trial attorney and supervisory attorney with Federal Defenders of San Diego, Inc. He also serves as a digital forensic consultant and expert.]

Ephemeral Messaging Apps are a popular form of communication. With privacy a concern for everyone, using a self-destructing message that works like disappearing ink for text and photos has a certain allure. All messages are purposely short-lived, with the message deleting on the receiver’s device, the sender’s device, and on the system’s servers seconds or minutes after the message is read. Although these apps were initially only used by teenagers, they are now a ubiquitous part of corporate culture.

According to the 6th Annual Federal Judges Survey, put together by Exterro, Georgetown Law CLE, and EDRM, 20 Federal Judges were asked “[w]hat new data type should legal teams be most worried about in the 5 years?”[1]  The overwhelming response was “Ephemeral Apps (Snapchat, Instagram, etc.).” Id.  In fact, 68% of those surveyed believed ephemeral messaging apps where the most worrisome new data type, whereas only 16% responded that biometric data (including facial recognition and fingerprinting) were the greatest risk. Only 5% were concerned with Text Messages and Mobile, and 0% were concerned with the traditional social media such as Facebook and Twitter.  Id.

Even now, Courts are attempting to sort out the evidentiary issues cause by ephemeral messaging apps, see e.g., Waymo LLC v. Uber Technologies, Inc. 17cv0939-WHA (NDCA).  This article discusses popular ephemeral messaging apps and discusses guidelines for addressing potential evidentiary issues.

Short technical background:

There are several background definitions relevant to this discussion:

  1. Text Messages – otherwise known as SMS (“Short Message Service”) messages, text messages allow mobile device users to send and receive messages of up to 160 characters. These messages are sent using the mobile phone carriers’ network. Twenty-three billion text messages are sent worldwide each day.  Generally, mobile carriers do not retain the contents of SMS messages, so the records will only show the phone number that sent or received the messages and the time it was sent or received.
  2. Messaging Apps – allow users to send messages not tethered to a mobile device (e., a phone number). With some apps, a user may send messages from multiple devices. These apps include iMessage, WhatsApp, and Facebook Messenger. Messaging Apps are generally free. Unlike text messages, these apps rarely have monthly billing records or records showing when messages were sent or received.
  3. Ephemeral Messaging Apps – are a subset of Messaging Apps that allow users to cause messages (words or media) to disappear on the recipient’s device after a short duration. The duration of the message’s existence is set by the sender. Messages can last for seconds or days, unless the receiver of the message takes a “screenshot” of the message before its disappearance.
  4. End-to-End Encryption – also known as E2EE, this is a type of encryption where only the communicating parties can decipher the messages, which prevents eavesdroppers from reading them in transit.

Common Disappearing Messaging Apps:

Messaging apps, like all apps, are changing.  The following is a list and description of several popular ephemeral messaging apps.


Snapchat – both a messaging platform and a social network. The app allows users to send messages and media (including words and emojis appearing on the media) that disappear after a set period of time. Photos and videos created on Snapchat are called “snaps.” Approximately 1 million snaps are sent per day.

Signal – an encrypted communications app that uses the Internet to send one-to-one and group messages which can include files, voice notes, images and videos, which can be set to disappear after a set period of time. According to Wired, Signal is the one messaging app everyone should be using.

Wickr Me – a messaging app that allows users to exchange end-to-end encrypted and content-expiring messages, including photos, videos, and file attachments.

Telegram – cloud-based instant messaging app with end-to-end encryption that allows users to send messages, photos, videos, audio messages and files. It has a feature where messages and attachments can disappear after a set period of time.

CoverMe – a private messaging app that allows users to exchange messages, files, photographs, and phone calls from a fake (or “burner”) phone number. It also allows for private internet browsing, and llows users to hide messages and files.

Confide – a messaging app that allows users to send end-to-end encrypted messages.  The user can also send self-destructing messages purportedly screenshot-proof.

Evidentiary Issues:

Messaging app data, like other forms of evidence, must, amongst other criteria, be relevant (Fed.R.Evid. 401); authenticated (Fed.R.Evid. 901 et seq); and comply with the best evidence rule (Fed.R.Evid 1001 et seq).

As for the Best Evidence Rule, based on the nature of disappearing messaging apps, the original writing of the message is not preserved for litigation. See Fed.R.Evid. 1004(a) (finding that the original is not required if “all the originals are lost or destroyed, and not by the proponent acting in bad faith.”) Sometimes, the contents of the message may be established by the testimony of a witness. In other cases, the contents of the message may be based on a screen shot of the message.

Authenticating messages from apps, regardless of their ephemeral nature, is often difficult—text messages can be easily faked. When it comes ephemeral messages, we often must rely upon a screenshot or testimony regarding the alleged contents of the message.  In such circumstances, the following factors—repurposed from Best Practices for Authenticating Digital Evidence—are useful[2]:

  • testimony from a witness who identifies the account as that of the alleged author, on the basis that the witness on other occasions communicated with the account holder;
  • testimony from a participant in the conversation based on firsthand knowledge that the screen shot fairly and accurately captures the conversation;
  • evidence that the purported author used the same messaging app and associated screen name on other occasions;
  • evidence that the purported author acted in accordance with the message (e.g., when a meeting with that person was arranged in a message, he or she attended);
  • evidence that the purported author identified himself or herself as the individual sending the message;
  • use in the conversation of the customary nickname, avatar, or emoticon associated with the purported author;
  • disclosure in the message of particularized information either unique to the purported author or known only to a small group of individuals including the purported author;
  • evidence that the purported author had in his or her possession information given to the person using messaging app;
  • evidence that the messaging app was downloaded on the purported author’s digital device; and evidence that the purported author elsewhere discussed the same subject.

Conclusion:

Ephemeral messaging app data will continue to impact investigators, attorneys, and the Court. Defense teams should be prepared for the challenges ephemeral messages cause from investigations to evidentiary issues.


[1]Available at https://www.exterro.com/2020-judges-survey-ediscovery.

[2] Hon. Grimm, Capra, and Joseph, Best Practices for Authenticating Digital Evidence (West Academic Publishing 2016), pp. 11-12.

 

E-Discovery: Computer Forensic Images and Computer Forensic Reports

[Editor’s Note: John C. Ellis, Jr. is a National Coordinating Discovery Attorney for the Administrative Office of the U.S. Courts, Defender Services Office. In this capacity, he provides litigation support and e-discovery assistance on complex criminal cases to defense teams around the country. Before entering private practice, Mr. Ellis spent 13 years as a trial attorney and supervisory attorney with Federal Defenders of San Diego, Inc. He also serves as a digital forensic consultant and expert.]

CJA panel attorneys frequently ask me for strategies for how to manage and review computer forensic images they receive in discovery. It is a great question. Forensic images are often difficult for CJA panel attorneys to access, and they can contain an immense amount of information (often much more than the rest of the discovery production). Without opening them, they already know that a lot of the information in the forensic image is irrelevant. But they also know that often crucial information is in the forensic image that is important for them to know so they can prepare their client’s defense.

Short technical background:

There are two ways data from a computer is provided in discovery:

  1. Duplicates, which refers to “an accurate and complete reproduction of all data objects independent of the physical media”; or
  2. Forensic Images, which refers to “a bit stream copy of the available data” (see SWGDE Digital & Multimedia Evidence Glossary, June 2016).

Usually the government provides forensic images.  The forensic image is created using specialized software such as opentext EnCase or AccessData Forensic Toolkit (FTK). These forensic images cannot be opened without specialized software. Although there are free viewer programs, such as AccessData’s FTK Imager, which enable users to review the contents of forensic images, the process can be time-consuming and difficult.

Computer Forensic Reports

Isn’t there a better way? Yes, there is. Computer Forensic Reports (there are caveats). But first, why are they important and relevant to you?

Besides the forensic image that the government provides you, they may also provide you something called a Forensic Report (or forensic program generated report). Two common examples for computers will be an EnCase Report or an FTK Report. These reports, generated through the forensic software program, can allow you to see and review the information extracted from the image in a more user-friendly way. This can frequently mean you won’t need to use a forensic image viewer or a computer expert to assist you.

FTK HTML Report

FTK HTML Report

Now these computer forensic reports are not the same as a law enforcement report written by an agent discussing what information was on a computer and describing the evidence they think may be relevant to the criminal investigation. These forensic reports are generated through the forensic tool that was used to examine the data found on the device.

So, the first thing you should do when the government provides a forensic image to you is to ask the government if they have a forensic report as well and request a copy.

Forensic reports are useful because they can make it much easier for a legal professional to review data extracted from the device without having to use a forensic tool. Since most forensic examiners work with law enforcement, they typically create these reports for case agents and prosecutors. The information in the report can include information about documents, images, emails, and web browsing history. These reports often show both the content of a file as well as the metadata (such as the date the document was created). These reports are limited to the data extracted from the original device, the parameters of the forensic program, and the choices made by the forensic examiner.

The forensic reports can be provided in a several formats, including PDF, Excel and HTML. Many forensic tools also include a reader or viewer program that is proprietary to the forensic too, such as Magnet’s AXIOM Portable Case, opentext’s EnCase and AccessData’s FTK also have reader or viewer programs. These forensic reports allow legal professionals to search, review, sort and filter information in ways that can be superior to reviewing the reports in PDF, HTML or Excel formats.

Axiom Portable Case

Axiom Portable Case

These reports are valuable and frequently provide most of the information that a legal team will need to understand the contents of a forensic image. It should be noted that forensic reports may not contain all data that was on the original digital device.  Therefore, counsel should consider engaging a forensic expert or consultant when he or she does not understand the forensic report or image.

[NOTE: Law enforcement will frequently generate a forensic report after completing an extraction from a mobile device. A common forensic report seen in federal criminal cases is a Cellebrite Reader Report. See the Mobile Forensic Reports post for more details.]