AI Legal Series

Legal AI Series [Chapter Five]: Unstructured Legal Data and the A.I. Lifeboat Solution

Share this article

The world is drowning in data. Music data. Photo data. Video data. Spreadsheet, flow chart, email, and website data. These days, no matter where you are or what you’re doing, there’s someone nearby creating data—even if it’s just CCTV footage, or the satellites orbiting our planet. (Say hi to Big Brother!)

Look at you, dear reader—even you, are creating data right now, as you browse this article, creating a digital footprint that is being recorded in a big bank of computers somewhere.

Data isn’t a bad thing. It’s just… a thing. A product of our life in this modern era, where, even if you aren’t actively creating it yourself, someone nearby is probably creating it about you.

For modern attorneys, this digital information is the lifeblood of a case—especially when it comes to unstructured data, since electronically stored media often contains vital intel that is critical to success. However, the sheer volumes of it are crushing attorneys, making it nearly impossible for legal teams to meet important discovery deadlines, keep client costs down, while at the same time, staying sane.

Luckily, A.I. technology has the answer.

With the introduction of machine learning software, attorneys now have full control and command over all types of data evidence (instead of the other way around). These data strategy solutions enable legal professionals around the globe to meet modern problems with modern solutions.

Here’s what you need to know about structured vs. unstructured data, how both are affecting the legal industry, and what is helping shape the A.I. legal revolution, providing dynamic solutions to the pressing dilemmas faced in e-discovery, today.


The world might be drowning in digital data, but in the digital world, not all data is created equal. Indeed, when it comes to electronically stored information (ESI), data actually comes in two forms: structured and unstructured.

Structured data exists within the confines of an already established database (for example, Excel). This material is fairly simple to search and categorize, since it already has that built-in support system, with clearly defined patterns and parameters.

Unstructured data, on the other hand, is more difficult to search, since it lacks the internal skeleton that structured data comes pre-equipped with. This type of data basically encompasses every other type of media you can think of, including emails, text messaging, office memos, videos, photos, CCTV footage, jailhouse recordings, and so much more.

Both structured and unstructured data appear often during document review, and each has the potential to contain vital, case-building insights that attorneys need to be successful. Unfortunately for contract attorneys, however, it just so happens that unstructured data (the “harder to search” variety) makes up the bulk of most e-discovery projects—and not by a little, either.

According to Forbes, a whopping 80% of electronically stored information that’s reviewed during e-discovery is unstructured. (Which actually makes sense when you think about it, because, let’s be honest. The world isn’t drowning in Excel spreadsheets. It’s suffering in a cesspool of bad Tweets.)

Here’s a closer look at these two types of data, and how they’re each affecting legal document review.

Structured Data

First off, let’s take structured data—the easy, “golden child” of electronically stored information.

This type of data typically resides in a relational database (RDBMS), which essentially just means that the data is organized and stored in tables. (Again, like Excel). This information is generally short and succinct, can be anything from zip codes, to social security numbers, to payments, and even names—basically whatever can fit neatly into a row-and-column-type organization system.

However, relational databases are so much more than just a convenient way to store and organize your marbles…

As anyone who has used Excel to execute payroll, keep track of projects, or compare yearly earnings knows, the power of an RDBMS isn’t just the fact that it can store information. It’s also about how it can compare and contrast that information with the data stored in other relational databases.

What all of this adds up to, is a body of information that is specifically designed to make it easy for users to organize, search, filter, categorize, and compare, making it quite simple to navigate during electronic discovery.

And then there’s unstructured data.

Unstructured Data

On the other side of the data tide pool, is document review’s problem child: unstructured data, which is responsible for creating at least ninety percent of all e-discovery headaches (give or take).

One of the main reasons unstructured legal data is so problematic is because there’s just so much of it. Unstructured information can be either human-generated or machine-generated, and it makes up the bulk of information we’re used to encountering on a daily basis.

For example, here’s some of the unstructured data most are probably familiar, in both the human-generated, and machine-generated categories:

Text Documents:Office memos, briefs, and presentations
Electronic Communication:Emails, IM chats, and Zoom calls
Photo Sharing:Shutterfly, Flickr, and Instagram
Document Sharing:Google Drive, Dropbox, and iCloud
Social Media: Instagram,Instagram, TikTok, and Twitter
Mobile Data:text messaging, voicemail, and phone location data
Satellite Imagery:Weather patterns, military movements, and Google Maps
Scientific Data:Seismic activity, atmospheric information, and space exploration
Digital Surveillance:CCTV, jailhouse recordings, and digital software observation
Sensor Data:Traffic accident reports, storm tracking, tsunami and other oceanographic sensors

While these lists aren’t exhaustive, they at least give a good general sense of what types of information fall under unstructured data. And when you consider how much is being produced, daily—not just by the random cat mommy on TikTok—but by global corporations around the world, the amount is staggering.

This brings us to the second problem with unstructured information: its lack of organization.

Unlike its structured counterpart, unstructured ESI doesn’t follow any kind of organizational rhyme or reason. As its very name implies, this type of data is chaos. It is generated by countless sources—in at least as many formats—and even when it straddles the line of being semi-structured (such as the case with email or IM chats), things are still messy, making it a nightmare to collect, review, filter, and catalogue during electronic discovery.


Because unstructured legal data has so many variables and unsearchable moving parts, a unifying solution is needed—a data strategy that quickly and easily locates any information a firm wants, without needing an A.I. expert on call.

Objects and Images: Search and Destroy

Unstructured data isn’t limited to text information, so why should its solution?

A.I. programs give legal professionals the ability to expand search parameters into the realm of videos and photos—not just text. With a set of fully customizable tools at their disposal, attorneys can scan unstructured media for faces, objects, logos, and so much more.

With the information identified, these programs can then be implemented to eliminate sensitive and personal information from the records, helping to blur out faces, redact license plate numbers, and remove other identifying personal information from film and photo faster and more accurately than ever before.

These tools enable firms to effectively meet compliance requirements, while also protecting sensitive information from the public eye. In addition, they have the added perk of granting attorneys valuable, early case insights.

Establishing Early Case Insights

As if the ability to search images and videos weren’t already great enough, with A.I. programs—attorneys can now transform audio, video, text, and other unstructured media data into early case insights.

Early case assessment helps attorneys identify key issues in a case, highlighting a suit’s strengths and weaknesses, identifying important search terms, quantifying the amount of relevant data, as well as citing potential costs and liabilities that may arise. These insights can then be used to narrow the focus of a budding lawsuit, ultimately benefiting both clients and legal professionals, alike, in the untold number of hours and dollars saved, overall.

And it’s not just us.

For the global community, these tools couldn’t have come at a better time.

Daniel Wong on Email
Daniel Wong
Marketing Director at Veritone, Inc.
Daniel Wong is currently the Marketing Director for Veritone, Inc., (NASDAQ: VERI) – a leader in enterprise artificial intelligence (AI) solutions for commercial and regulated industries such as Media & Entertainment, Energy, Government, Legal, Compliance, just to name a few. With more than 25+ years of hardware and software product management and product marketing experience in the technology sector, Daniel has been fortunate enough to part of the explosion of bleeding edge technologies such wireless networking, the Internet of Things, and now artificial intelligence.

Share this article