Cristin Traylor, Relativity: How to Scale Defensible Generative AI Results for Document Review

relativity logo

Extract from Cristin Traylor’s article “How to Scale Defensible Generative AI Results for Document Review”

Just as with the technology-assisted review (TAR) methods that came in the two decades before its release, two fundamental questions remain at the heart of any conversation about whether generative AI is a viable option for document review for production or other targeted document requests:

  • First, is it accurate?
  • And second, is it defensible?

When TAR first became available, the defensibility of its use in litigation was rigorously scrutinized and debated. But a series of judicial decisions established it firmly, thanks in large part to the validation metrics that practitioners used to measure the accuracy of its results.

These statistical metrics included:

  • Recall: the percentage of all relevant documents in a population that the AI correctly predicted to be relevant. A higher value is better.
  • Precision: the percentage of documents in a population that the tool predicted are relevant, that are truly relevant. A higher value is better.
  • Elusion: the percentage of documents predicted to be not relevant, that are actually relevant. A lower value is better.
  • Richness: the percentage of all documents in the collection that are relevant. Higher or lower values aren’t better or worse, but a lower value often requires larger sample sizes for validation testing and may result in a wider margin for error.

Read more here

ACEDS