data privacy ediscovery (2)

Effective Search Is a Complex Art Form

Share this article

You rely on searching every single day, whether enmeshed in an eDiscovery project, shopping online, or figuring out where to go for dinner. There are differences in the intent and the treatment of those search results when talking about everyday life compared to a high-stakes eDiscovery project. The terminology, grammar, and specificity of language are critically important to a well-run search and can be the difference between finding the smoking gun or missing it.

Search Technology

Common search technology is designed to find the best results, not all the results. Imagine having to search the internet for all instances of your name and review all of them. In eDiscovery, you are often faced with finding all of a particular term in order to review those documents and ultimately produce some of them.

Certain everyday search technologies haven’t made their way into eDiscovery yet. The best example is the treatment of misspellings, synonym analysis, and rudimentary entity analysis. Misspellings are corrected in a Google search with “did you mean” results where it automatically searches for the correct spelling. But in eDiscovery, you may intentionally want to search for the misspelling and the correct spelling of a word. Synonym analysis is typical in everyday search and occurs when you’re looking for one thing, but you get the results of something different because the words mean the same thing. This is not as common in eDiscovery but is present when using the clustering technologies of analytics or when performing a similar document search. Entity analysis-powered search is the idea of using the context around a word to figure out what it means, for example, identification of slang words.

Order of Operations

The order of operations is something many remember from grade school, and it also has relevance in search. You can string together as many ANDs as you want, similar to how the order of numbers in an addition equation doesn’t impact the result. But the moment you insert an OR or NOT to that AND search, the meaning changes completely.

If you have not used parentheses, there are two different ways the search parser could interpret your request. Is the AND tied to the OR? Or is it by itself? Unfortunately, there is no guarantee that everyone’s search parser uses the same default. The solution is to use parentheses which force a search parser to perform the searches in the order you have indicated.

Feeding, Dog and Family Example

Common Mistakes

There are several common mistakes in search, including date issues and keyword search technology.

One of the frequent missteps in basic search revolves around dates, either not accounting for time zones or time zone normalization and not using greater than/less than/equal to operators in date range specification. Processing data in universal time eliminates the time zone issue because you can apply search in universal time. Documents can still be produced in a specific time zone later if needed. This has become particularly important as daylight savings time is becoming less recognized.  

When searching a range of dates, it is essential to understand the date field you are searching and how that range is applied. Using greater than or equal to search parameters to include a day before and a day after the actual dates may lead to better results. Also, consider which date you are searching, particularly with email. Is it the sent date, modified date, received date, or attachments modified date? 

Understanding search technology is critical when using keyword search, particularly the limitations. For example, the most commonly used eDiscovery search tool (DTSearch) has a list of noise words in the default settings; most default deployments have over a hundred of these words. This came into play during one of our cases many years ago. We were working for a manufacturer who made gasoline cans, and one of their search terms for this litigation was “gas can.” Noise words are treated similar to white space in DTSearch, so AND, WERE, and THE were all considered safe to ignore. Therefore, a search for “gas can lid” using default settings would be interpreted as “gas lid” because CAN was a noise word, and the search returned results for phrases like “gas cap lid.” In this case, searching for “gas can” (even though the word was in quotes) returned results for just the word “gas,” which ended up delivering voluminous results.

Exercise Care

As data volumes continue to grow exponentially, careful planning, testing, and evaluation of search terms and results will go a long way toward delivering successful outcomes for you.

Dr. Gavin Manes on Email
Dr. Gavin Manes
CEO at Avansic
Dr. Gavin Manes is a nationally recognized eDiscovery and digital forensics expert. He founded Avansic in 2004 after completing his Doctorate in Computer Science from the University of Tulsa. At Avansic, Dr. Manes is committed to high-technology innovation, research, and mentorship, and has several patents pending. Avansic's scientific approach to eDiscovery and digital forensics stems from his academic experience.

Dr. Manes routinely serves as an expert witness including consulting with attorneys on data preservation issues. He contributes academic content to peer-reviewed journals and delivers classroom lectures. See his full CV at gavinmanes.com.

Dr. Manes has published over fifty papers on eDiscovery, digital forensics, and computer security, countless blog posts, and educational presentations to attorneys, executives, professors, law enforcement, and professional groups on topics from eDiscovery to cyber law. He’s briefed the White House, the Department of the Interior, the National Security Council, and the Pentagon on computer security and forensics issues.

At the University, Dr. Manes formed the Tulsa Digital Forensics Center, housing Cyber Crime Units from local, state, and federal law enforcement agencies. He’s a founder of the University of Tulsa’s Institute for Information Security, leading the creation of nationally recognized research efforts in digital forensics and telecommunications security.

Share this article