Extract from Vasudeva Mahavishnu’s article “Decoding the Past: The Challenges of Predictive Coding in eDiscovery”
Electronic discovery (“eDiscovery”) is the process of identifying, gathering, and analyzing electronically stored information (ESI) for use in legal proceedings. As the volume of ESI proliferates, so does the necessity for sophisticated methods to navigate and interpret this vast information landscape. While Predictive coding and Continuous Active Learning have been heralded as frontrunners in addressing these challenges, they come with constraints. Meanwhile, newer technology offers a promising alternative, potentially reshaping the eDiscovery paradigm.
Predictive Coding in eDiscovery
Predictive coding is a technology-assisted review (TAR) method that includes reviewing and coding a subset of documents and then training a machine learning model on this subset to predict the relevance of the remaining documents.
Overfitting in Predictive Coding
Overfitting is a common problem in machine learning and predictive modeling. It occurs when a model captures the noise or random fluctuations in the training data rather than the underlying distribution. When applied to predictive coding in eDiscovery, overfitting can have serious consequences, especially in the context of the document review process.