person on computer

Five Advanced Techniques for Self-Serve eDiscovery

Share this article

You’ve made the leap – you’re either starting to do eDiscovery yourself or a seasoned eDiscovery veteran who has a higher workload, strange data, or has to contend with a new platform. Or all of those! As long-time experts in this field, we’ve seen some issues that arise for people in this situation. We’ve identified them and are sharing a few handy pointers to address them.

Cell Phone Data

Data extracted from cell phones can be difficult to load into an eDiscovery tool in a way that makes it easy to search and easy to integrate with preexisting project data. Since there are different tools to extract data from cell phones, some forensically sound and others that simply copy data, it may mean a different situation each time.

There are three ways to import cell phone data: screenshots, a message database, or load files. Screenshots provide no underlying metadata to assist in loading or searching. A message database from the phone can contain text messages, chat, multimedia messages, and other communications in the form of SQL Lite databases. However, a point worth noting is text message data from most cell phones isn’t as metadata-rich as email. An email has very specific header information, while text messages may have as little as a received date. Most problematic, they do not have any threading data commonly seen in email through a Conversation Index. This means grouping conversations together can be difficult.

There has been an argument in the past to use spreadsheets to review text messages. However, cell phone data can be converted to load files directly or through middleware. This is made possible by modern processing techniques making cell phone data similar to email by issuing family groupings and conversation threading specifically designed for cell phone messaging.

Sensitive Information

Keywords, dates, and analytic tools help look for concepts, but there is also a need to search for sensitive information reliably. It’s common for reviewers to assume they don’t have sensitive information, but they might be forgetting they receive things like tax returns (which have SSNs) or internal reports with bank accounts.

Some platforms have integrated tools that allow searching for Social Security Numbers, phone numbers, bank accounts, and even common names. This is more complicated than entity identification or regular expression pattern searching since the data must be validated by an algorithm using context around the detected data. The most common use case is determining if a data breach contained sensitive information since a number of notifications and actions cascade from that determination.

Modern tools have made searching for PII as easy as keyword searching, complete with hit reports, hit highlighting, and navigation within the platform. Combining this search technology with automated redaction technologies creates an incredibly powerful screen tool for productions, document requests, or disclosures.

Automatic Redaction

The most common workflow for automatic redaction – the eDiscovery tool is given a list of words or characters, locates them in documents, and applies black boxes over the items it finds. This workflow often does not allow for review of the automated process or to determine what was redacted if those redactions are “burned in.”

Some tools allow either a review process or see-through redactions to serve as approval. In many ways, it is like a cover you can temporarily remove to make sure it is correct. Another option is “fuzzy” redactions, combining text-based search, concept search, and sensitive data search. This way, the search is expanded beyond simple words to redact entire phrases.

Expert Review of Data

Downloading or emailing documents to experts has been common practice in the past. Most eDiscovery tools currently allow external users to view subsets of data loaded within them. Tools have recently adapted to allow sharing that data through invitations instead of creating completely separate projects or data storage locations. This enables legal teams to share documents easily. They can further monitor access in a situation similar to a clean room, significantly reducing the possibility of uncontrolled data (leaked or misplaced). You can use the same technique to share data with clients to maintain compliance with protective orders and other restrictive document designations. Keeping the data in one location is much safer than creating multiple electronic copies or printing.

Forensic Timeline Data

Digital forensic investigations detail a series of events with exhibits. These attachments are evidence from the investigated device (computer, cell phone, or server). Integrating a report into a typical discovery review is difficult because loading the native data to the review tool may not achieve the desired effect.

For example, forensic artifacts from timelines often come from system data registry entries (in Windows) or plist (in Mac), stored in single large files. Loading these files into an eDiscovery tool would not normalize the data against the rest of the existing information, such as sorting by document date, including modified date for a file, or sent date of a text message. However, since forensic tools can convert these large files into discrete events, they can be added to load files and used in the loading process. In this case, when sorting by document date, the reviewer could see that someone logged in to a computer, opened their email client, received email sent the previous day, opened an attachment to one of those emails, then plugged in a thumb drive and saved that file. This would all show as “inline” with normal sorting during document review.  


Digging deeper into analytics, special loading, and user-specific access can assist in scenarios where you have cell phone data, experts, or the need to handle personal information. Although these situations may not arise all the time, they occur often enough that it’s helpful to investigate the capabilities of your eDiscovery tool. If your tool doesn’t have the capabilities, maybe it’s time to consider a switch.

Dr. Gavin Manes on Email
Dr. Gavin Manes
CEO at Avansic
Dr. Gavin Manes is a nationally recognized eDiscovery and digital forensics expert. He founded Avansic in 2004 after completing his Doctorate in Computer Science from the University of Tulsa. At Avansic, Dr. Manes is committed to high-technology innovation, research, and mentorship, and has several patents pending. Avansic's scientific approach to eDiscovery and digital forensics stems from his academic experience.

Dr. Manes routinely serves as an expert witness including consulting with attorneys on data preservation issues. He contributes academic content to peer-reviewed journals and delivers classroom lectures. See his full CV at

Dr. Manes has published over fifty papers on eDiscovery, digital forensics, and computer security, countless blog posts, and educational presentations to attorneys, executives, professors, law enforcement, and professional groups on topics from eDiscovery to cyber law. He’s briefed the White House, the Department of the Interior, the National Security Council, and the Pentagon on computer security and forensics issues.

At the University, Dr. Manes formed the Tulsa Digital Forensics Center, housing Cyber Crime Units from local, state, and federal law enforcement agencies. He’s a founder of the University of Tulsa’s Institute for Information Security, leading the creation of nationally recognized research efforts in digital forensics and telecommunications security.

Share this article