News flash: Nearly a quarter of legal industry professionals expect generative AI and large language model (LLM) technology to have a transformative effect for eDiscovery this year, according to the 2024 State of the Industry report published by eDiscovery Today. In antitrust, money laundering, legal operations, and more, AI is poised to shake up current legal frameworks.
But while these tools have shown tremendous promise in streamlining eDiscovery review, they are most effective when used properly, and in conjunction with a sound approach. Best practices are the key to success with GenAI, just as they are with any review methodology. In this article, we’ll explore how law firms and law departments can apply proven best practices in the age of AI.
When Keywords Become Prompts
One of the most attractive aspects of GenAI and LLMs is the perceived ease of use. The models seem so straightforward: Just type a prompt into the query box, click a button to send the prompt to the model, and get a response. Compared to creating a keyword search – the method many legal professionals use for document retrieval – this appears to be a snap. No syntax to learn; just prompt the model and get a response. Easy, right?
Not so fast. Just as poorly developed keyword searches can retrieve result sets that are under-scoped or over-broad, poorly written prompts can retrieve a lot of information from the GenAI model that you don’t need. Or worse, they fail to retrieve the information you do need. Just as there are best practices for constructing and refining keyword search terms, there are best practices for how you submit prompts to AI models.
Prompt engineering begins with crafting and refining queries to effectively communicate with GenAI models to produce desired outcomes. It’s an iterative approach – much like keyword searching when done correctly. Unfortunately, just as many legal professionals don’t test their keyword searches, many take the same approach to GenAI. Garbage in is garbage out, regardless of the technology.
Here’s an example of iterative prompt engineering. Let’s say we want some information about corporate internal investigations. We might start like this:
“What types of issues could lead to an internal investigation being initiated within [company name]?”
The AI might provide a general list of issues, such as financial irregularities, harassment and discrimination, breach of confidentiality, and theft of intellectual property.
Based on that response, we might want more specific information about a particular issue, like identifying the theft of intellectual property. The new prompt:
“What are signs that [employee name] may have stolen intellectual property?”
That can get you a more focused response, identifying signs like unusually large data transfers, increased use of USB drives, unauthorized installation of software which could be used to send or store data, and a sudden resignation of the employee.
With prompt engineering, you could continue to drill down into the particulars, like more detail on the use of USB drives by the employee. You can also tailor how the model responds (e.g., add “in less than 100 words” or “in a single paragraph”) if you want a more focused response.
You could even ask the model to discuss the topic or explain complex concepts in a certain way, such as “explain this as a forensic examiner would explain it” or “explain it in a way a 9-year-old could understand.” These personas further customize the AI model responses to fit your needs.
In short, retrieving documents in eDiscovery document review takes a strategic, iterative approach. This is true whether you use keyword searches, technology assisted review (TAR) – or GenAI.
Iteration = Preparation + Validation
Just as there are similarities between keyword and prompt best practices, the need for upfront preparation for review and to validate the results remains the same regardless of the technical approach. Generative AI tools aren’t perfect, even with well-defined prompts. Reviewers must plan effectively, execute efficiently, and validate thoroughly. Here’s how:
- Define Clear Parameters and Objectives. A successful review must begin with a sharp understanding of the legal issues and goals of the case. This includes understanding what constitutes “relevance,” how parties communicate, as well as key terms and acronyms. It also includes objectives for review itself, whether that’s identifying responsive documents, privilege review, issue coding, or something else.
- Develop a Comprehensive Review Plan. Outline the steps and processes of the review, including timelines, priorities, and workflows. Decide on the technology to be leveraged (such as keyword searching, TAR, or GenAI tools) to assist the review process. These approaches aren’t mutually exclusive; it’s possible to have a review workflow that leverages two – or even all three – approaches.
- Define Protocols and Guidelines. Every approach will require some level of manual review, whether it’s to conduct a full manual review or train the model to automate classifications. It’s important to create detailed coding manuals and document review protocols to ensure consistency across the review team. This includes definitions and examples of what constitutes responsive or privileged information.
- Train the Review Team. Provide comprehensive training for all reviewers on specific legal issues, use of the document review platform, and review protocols. Ensure they understand the nuances of the case and the review platform’s features.
- Conduct the Review Iteratively. Regardless of the approach, it’s common to adjust your approach based on findings from quality control (QC) checks, changes in case strategy, or new information.
- Implement QC Checks. Regardless of the approach used, it’s important to establish rigorous inspection procedures throughout the review process. This involves spot-checking the work of individual reviewers, conducting consistency checks to ensure consistent coding for similar documents, and verifying that reviewers follow the established guidelines.
- Test Results and Null Set. Apply statistical sampling to evaluate the completeness and accuracy of your document review process. Conduct random samples of document sets as needed to evaluate the result set of responsive documents. Also perform elusion testing of the “null set” to determine the proportion of predicted non-responsive documents (negatives) that actually are responsive through human review or automated classification via TAR or GenAI models. This testing is often iterative until you receive the desired results.
- Document Thoroughly. Keep detailed records of the review process, including decisions made, changes to the protocol, issues encountered and test results. This documentation is key to the ability to a defensible process for opposing counsel or the court.
When Andrew J. Peck became the first judge to approve TAR for review in the Da Silva Moore case in 2012, he wrote that TAR “is not a magic, Staples-Easy-Button solution appropriate for all cases” and that “it is not a case of machine replacing humans; it is the process used and the interaction of man and machine” that is at issue.
Generative AI isn’t a “magic, Staples-Easy-Button” either. Just like TAR, it’s a tool that needs to be properly used to be fully effective, as part of a review that is planned effectively, conducted efficiently, and validated thoroughly. Technology changes, but best practices for review – managed by humans – never change.