Extract from Isha Marathe’s article “Legal Tech’s Problem: Generative AI Is Readily Available, But Not Legal Data”
To say that the legal sector is wanting for any more documents would be grossly misleading. In fact, the opposite is likely true. And yet, legal tech companies that are rolling out generative artificial intelligence tools and integrations are experiencing something akin to a lack of training data.
This isn’t because the data, packed into thousands of contracts and legal papers, doesn’t exist. But rather that it isn’t quite as trainable as many providers initially thought.
As more law firms and corporate legal departments impose restrictions on using their documents for training purposes, coupled with a fear of data breaches associated with generative AI tools, legal tech vendors find themselves with a handful of shiny cars, but not enough fuel to run them.
To be sure, experts told Legaltech News that while the dearth of trainable data might put a strain on legal tech companies in the short term, there are ways around it—from using more general training sets to relying on fine-tuning AI on legalese. But different providers will experience varying levels of difficulty in going about training their tools.
What’s Making Swathes of Legal Data ‘Untrainable’?
John Brewer, AI officer and chief data scientist at e-discovery company HaystackID, said that in an ideal world, “if we wanted to train an e-discovery model to be good at reading the kind of data that we push through [in] large amounts on a regular basis, the way that we would do that is train it on actual authentic discovery data.”