AI Legal Series

Legal AI Series [Chapter Eight]: AI-Powered Redaction: Protect Your Docs, and Media, Too

Share this article

So far, our discussions about the AI legal revolution have revolved around the many ways artificial intelligence is helping to combat the unstructured data crisis looming over the legal industry. However, one area we’ve yet to address, is what to do about all the personally identifying information (PII) that’s often contained within unstructured media files.

Unlike in days gone by, redacting unstructured data can’t be accomplished by hand with a box of good sharpies. Instead, you need a computer program. An automated redaction software that’s not only capable of editing sensitive text from documents, but that’s also equipped to handle faces, objects, and information inside unstructured media. One that’s time sensitive, cost-effective, and can be fully customized to meet the fluctuating needs of any compliance request, all without damaging any original files.

Until recently, getting all of that in a single platform has been a tall order. However, with automated redaction software, attorneys can now have their cake, and eat it too.

In this article, we’ll talk about what PII is, how unstructured data has expanded its scope, and discuss all the ways in which AI can both better protect individual privacy, and streamline the e-discovery redaction process, all at the same time.


First off, let’s talk about personally identifiable information, and why it’s so important to protect.

The phrase, itself, is a mouthful (unfortunately, not of the cake variety), which is why most people simply shorten it to its acronym, PII. According to Homeland Security, PII is defined as any kind of information that either reveals the identity of a person, or else gives enough hints for someone to infer that identity.

PII is roughly divided into two categories:

  1. Sensitive PII
  2. Non-sensitive PII

Here’s how they differ from each other.

Sensitive PII vs. Non-sensitive PII

As you might already have inferred, sensitive PII is the big no-no. This type of information is pretty obviously personal, and—according to the Federal Rules of Civil Procedure (along with the rules of most states)—law firms are required to redact it before making documents public. This includes things like a person’s:

  • Home or email address
  • Cell phone or landline number(s)
  • Social Security Number
  • Passport or immigration information
  • Financial account numbers
  • Medical records
  • Photos and videos (particularly of the face, or other identifying features)
  • Biometric data (such as retina scans, voice signatures, or facial geometry)

On the other hand, you have non-sensitive PII. This category includes things that don’t quite spell out your name and address with a flourish, but definitely flirt a little too close to home for complete comfort. We mean information like your:

  • Birthday
  • Race
  • Gender
  • Zip code
  • Place of birth
  • Religion

While these things aren’t as invasive as sensitive PII, non-sensitive PII might still need to be redacted, if it’s combined with other specific identifiers.

For example, if I had a document that read: “The following schedule is the personal itinerary for that one person who runs the country, and lives in the big, white house in Washington D.C….” you’d obviously know who I was talking about, even without me having to actually reveal any sensitive PII.

Hence, it’s sometimes necessary for firms to flag and redact even non-sensitive information, before disclosing certain documents.

Failure to Redact PII

The consequences for not redacting PII in discovery can range from embarrassing, to expensive, all the way up to civil liability, and—depending on the circumstances—even criminal charges. And that’s just on the home front.

Abroad, EU law makers have thrown down a pretty serious PII detection gauntlet, in the form of the General Data Protection Regulation (GDPR), which was implemented in 2018. Under these regulations, even companies located outside of the EU face painfully steep fines for the unauthorized release of an individual’s personally identifiable information.

Bottom line? Redacting PII in discovery isn’t something you want to mess up.

That being said, for the modern attorney, PII redaction can be a lot more difficult than it might sound. Especially when you consider that legal professionals are dealing with much more than just text documents, but unstructured media files, too.


The inevitable side effect of the unstructured data boom is that capturing a person’s PII is as easy as snapping a photo or pressing record. Because remember, PII is not limited to just text, alone. Sensitive PII also includes facial features and biometric data—the kind we unintentionally collect every time we make a recording in a public place.

With PII lurking within the background pixels of untold numbers of unstructured media files, the challenges of e-discovery redaction have become a new kind of monster altogether. A time-consuming, expensive task, with a high likelihood of human error. One that requires attorneys to laboriously scour individual photos, videos, and audio files, searching for faces, objects, and words to blur, mute, and obscure by hand.

Considering that, on average, every attorney wastes over eleven hours per week on document review related problems, it’s no wonder that attorneys on both sides of the aisle are simply agreeing to leave out this evidence altogether.

However, anyone who’s been paying attention to current technological trends knows that this is fast becoming a non-option.

Around the globe, the number of cases relying almost exclusively on unstructured data are climbing, so it’s no longer enough for AI solutions to offer text redaction, alone. Instead, attorneys need a program that comes equipped with automated redaction software. The kind of redaction machine learning algorithms that can not only find and redact PII in text, but in audio, photo, and video files, too.

Protecting confidential information is one of the biggest concerns with any large-scale e-discovery project. Luckily, legal professionals can now redact PII within text and handle the complexities of unstructured media, as well. And—when used in conjunction with an AI-powered early case assessment tool—the modern attorney is not only capable of tackling the biggest hurdles in document review, they also have an added superpower…

They can tell the future.

With AI algorithms that can recognize and analyze patterns within evidence, legal professionals can now receive early case insights, which can help predict a lawsuit’s optimal direction, all before attorneys even know what they’re dealing with.

And if that isn’t telling the future, we don’t know what is.

Daniel Wong on Email
Daniel Wong
Marketing Director at Veritone, Inc.
Daniel Wong is currently the Marketing Director for Veritone, Inc., (NASDAQ: VERI) – a leader in enterprise artificial intelligence (AI) solutions for commercial and regulated industries such as Media & Entertainment, Energy, Government, Legal, Compliance, just to name a few. With more than 25+ years of hardware and software product management and product marketing experience in the technology sector, Daniel has been fortunate enough to part of the explosion of bleeding edge technologies such wireless networking, the Internet of Things, and now artificial intelligence.

Share this article