Inside The Black Box: Excluding Evidence Generated by Algorithms

[Editor’s Note: John C. Ellis, Jr. is a National Coordinating Discovery Attorney for the Administrative Office of the U.S. Courts, Defender Services Office. In this capacity, he provides litigation support and e-discovery assistance on complex criminal cases to defense teams around the country. Before entering private practice, Mr. Ellis spent 13 years as a trial attorney and supervisory attorney with Federal Defenders of San Diego, Inc. He also serves as a digital forensic consultant and expert.]

For many years, law enforcement officers have used records generated by mobile carriers to place a mobile device in a general area. The records are called Call Detail Records (“CDRs”). CDRs are generated when a mobile device sends or receives calls and text messages. Mobile carriers likewise keep records of when data is used, such as browsing the internet. These records are called Usage Detail Records (“UDRs”). At times, the records generated by mobile carriers include the location of the cell site or cell sites and the direction of antenna that connected with the mobile device.

Cell Site Location Information (“CSLI”) is the practice of creating maps showing the possible coverage area of a cell site at the time a device was being used. For these purposes, it is important to keep in mind that the records only show the location of the cell site and the direction the antenna is facing. Recent technological improvements have resulted in mobile carriers now generating Enhanced Location Records (“ELRs”), which purport to show more precise location data. In AT&T parlance, such records are based on the Network Event Location System (“NELOS”). This location data is derived from proprietary algorithms.

In a recent federal case, the government, through a member of the Federal Bureau of Investigation’s (“FBI”) Cellular Analysis Survey Team (“CAST”), sought to introduce NELOS records in a trial. However, after a Daubert hearing where the CAST agent testified, the district court excluded the records, in part, because of concerns over the reliability of the algorithms used to determine the location data.

This article provides an overview of CSLI and NELOS records, discusses the order excluding NELOS records from trial, and provides practical advice for practitioners.

Overview:

When CDRs include cell site location data, analysts and law enforcement officers use these records to show the location of the cell site and the orientation of the sector. In North America, many cell towers contain three sets of antennas, with each set offering specific coverage area.

Picture 1

To illustrate this point, Picture 1 is an overview picture of a multi-directional cell tower. Each blue arm is a sector. When a mobile device connects to a cell site, the mobile carrier often records the activity (i.e., a sent text message), the time of the activity, and the location of the cell site and sector that was used.

Using these three data points, analysists and law enforcement officers create maps showing the location of the cell site and the orientation of the sector. In Map 1, the arms are used to demonstrate the beamwidth of the sector, which in this case records indicate is 120-degrees. The cone at the base of the triangle is only meant to show the orientation of the sector, not coverage area. Moreover, analysts generally will not testify that the mobile device was within the triangle. The triangle is only meant to represent the location of the cell site and the orientation of the sector.

Map 1

With NELOS records, on the other hand, the ELRs purport to show the location of a device as opposed to the location of the cell site. In the following example, the red pin represents the location of the device. The blue circle represents what AT&T calls the “Location Accuracy.” This accuracy ranges from approximately several meters to 10,000 meters. And some records are marked by “location accuracy unknown.” As discussed below, the Location Accuracy is determined by proprietary algorithms used by AT&T.

Map 2

In Map 2, the ELR indicates that the “[l]ocation accuracy [is] likely better than 300 meters.” In other words, the phone was at the red pin or within the blue circle at a specific date and time. NELOS records, however, contain the following statement: “The results provided are AT&T’s best estimate of the location of the target phone. Please exercise caution in using these records for investigative purposes, as location data is sourced from various databases, which may cause the location results to be less than exact.” DE 156 at 23 (emphasis added).

To put the first two examples into perspective, Map 3 shows both traditional CSLI and the use of NELOS records.

Map 3

The NELOS demonstrative, even taking account of the “Location Accuracy,” still provides a much smaller, and thus more specific, area of where the phone activity took place.

United States v. Smith, et al. (4:19-CR-514-DPM) (EDAR):

Donald Smith and Samuel Sherman were charged in a five-count indictment with various crimes relating to a murder. See Docket Entry (“DE”) 1. The government sought to introduce the testimony of CAST Agent Mark Sedwick “that provider-based location data typically is collected by obtaining historical call detail records for a particular cellular telephone from the service provider, along with a listing of the cell tower locations for that service provider.” DE 102 at 1. According to the government, “[t]his data is then analyzed for the purpose of generally placing a cellular telephone at or near an approximate location or locations on a map at points in time.” Id.

The government sought to have Agent Sedwick testify “regarding the activity and approximate locations of the cellular telephones believed to have been utilized by Donald Bill Smith, Samuel Sherman, Racheal Cooper and Susan Cooper on the approximate dates and times relevant to the charges in the Indictment.” Id. at 1-2. Attached to the government’s motion is the report created by Agent Sedwick. Maps 4 and 5 are examples from Agent Sedwick’s report. Map 4 shows how Agent Sedwick mapped traditional CSLI, and Map 5 shows how he mapped the same time period using NELOS records:

 

Map 4
Map 5

Map 4 shows traditional CSLI mapping with the location of the cell site and the orientation of the sector. With Map 5, each circle represents the area in which the device was used. Here, there are four such events. For comparison, in Map 4, Agent Sedwick’s opinion is limited to testifying about the location of the cell site and the orientation of the sector, whereas with Map 5, the testimony is the mobile device is within the circle.

Prior to trial, defense counsel challenged Agent Sedwick’s potential testimony and the district court conducted a hearing to determine the admissibility of the records pursuant to Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 US 579 (1993). During the hearing, Agent Sedwick explained the reason AT&T created NELOS was to “test the health of the 3G network for planning and troubleshooting. It is a passive system where, while the phone is on the control channel communicating with the network across the control channel, it would passively pull whatever location data it could pull or data to compute location from that device.” DE 156 at 8.

Agent Sedwick further explained: “NELOS also became the generic term for any kind of location data. So depending, there might be other databases that were also pulled into the NELOS report that we receive from AT&T. Just from that report there’s no way to determine what other databases that was pulled from.” DE 156 at 9.

Agent Sedwick also provided information about known issues with NELOS data, specifically based on Temporary Mobile Subscriber Identity (“TMSI”). By way of background, mobile devices are assigned an International Mobile Subscriber Identity (“IMSI”), a unique number used by mobile carriers, which establishes that the mobile device can operate on a specific network. This is the number used by mobile carriers when creating CDRs. At times, however, in order to mask a device’s actual IMSI, networks assign the device a TMSI.[1] This is problematic for NELOS records because as Agent Sedwick explained, “[t]hat TMSI sometimes can get reallocated and then allocated back to a device, so you can have sometimes where the NELOS data will pull from a different device and get put into the records for the device that you’re requesting.” DE 156 at 10.

During cross-examination, Agent Sedwick was questioned about the portion of NELOS records that “caution in using these records for investigative purposes.” Agent Sedwick responded: “I wouldn’t rely on it if all I had was a NELOS point putting someone at a scene and that’s all I had, no, I would not use it. I’m using it—there is a caution with it, but I’m using it in the context of I have call and text to support it, I have other data to support, I have very good precise NELOS data. I feel very, very confident that this is accurate.” DE 156 at 24.

Agent Sedwick’s confidence in the accuracy of NELOS records was based on the proprietary algorithms created by the phone company. See DE 156 at 12 (“Question: Okay. So the device is sending various different events, they’re plugged into that algorithm, and essentially the algorithm will spit out what it computes as accuracy; is that correct? Answer: Yes, ma’am”). But Agent Sedwick acknowledged that he was not privy to the algorithm, nor whether NELOS was tested by AT&T for reliability. Instead, Agent Sewick testified he believed the algorithms are reliable “[b]ecause AT&T relies on that to make multi-million-dollar decisions on how they’re going to design their network.” DE 156 at 32.

In granting the defense’s motion to exclude NELOS data, the district court found:

What particularly concerns me, though, is this mystery algorithm that our—and the proprietary software. We don’t know, I don’t know exactly what is in the algorithm, and the agent gave some testimony at a general level about the kind of information that goes in, but it seems to me that I’m missing a—an important foundational stone there of something with more specificity as to the kinds of things that the algorithm uses and how the algorithm does its work.

We know that there are disturbances from time to time, or anomalies as was called, with the TMSI number. I also—I acknowledge some uncertainty about TMSI numbers and how many devices that might be connected with and how it is that the algorithm might deal with that. So there’s that. Then there is, in my view, almost a—so we’ve got our black box there, which is concerning, and I would say at this point there’s a peer review problem, as well, because I don’t have any scholarly literature or evaluation of the black boxes or the kind of things that could go into this black box and how it would work.

I understand about the corroboration, but I still find myself at sea of understanding how it is the—how things happen in the black box and whether—whether what comes out of the black box is sufficiently reliable that the jury can rely on it.

DE 156 at 85-87 (emphasis added).

Based on this, the district court entered the following order: “Agent Sedwick may testify about call detail records and historical cell-site analysis; but he may not testify about NELOS data and analysis.” DE 154.

Further Consideration:

The district court’s exclusion of NELOS records was based, in part, on the use of data generated by untested algorithms. Other mobile carriers also use ELRs, which generate purported location data that are also based on proprietary algorithms similar to NELOS. In seeking to exclude ELRs, as well as other forms of computer-generated data, counsel should encourage courts to question the reliability of evidence created by algorithms that lack independent validation and verification.

Glossary:

Acronym Full Title
CASTCellular Analysis Survey Team
CDRCall Detail Records
CSLICell Site Location Information
ELREnhanced Location Records
IMSIInternational Mobile Subscriber Identity
NELOSNetwork Event Location System
TMSITemporary Mobile Subscriber Identity
UDRUsage Detail Records

[1] As explained by EFF, “upon first connecting to a network, the network will ask for your IMSI to identify you, and then will assign you a TMSI … to use while on their network. The purpose of the pseudonymous TMSI is to try and make it difficult for anyone eavesdropping on the network to associate data sent over the network with your phone.” See https://www.eff.org/wp/gotta-catch-em-all-understanding-how-imsi-catchers-exploit-cell-networks.