eDiscovery Data Storage: The Obvious and Not-So-Obvious Costs

January 18, 2024/ACEDS Blog, Data and Technology

Share this article

What is a Gigabyte?

Seems really obvious, right? But maybe not. At the time of collection, a gigabyte may be data downloaded from O365 or Gmail, information extracted from a cell phone, or even (historically) a whole disk image. These gigabytes are considered raw unprocessed collected data.

But interestingly, most collected data won’t be processed or loaded to a review tool. Even if it makes it into a review tool, most of it probably won’t be relevant.

When examining online storage gigabyte charges, it’s important to understand what gigabyte is being calculated. For instance, in the iCONECT platform, a storage report is automatically generated every night that tells us the total amount of gigabytes that a particular project is consuming across all facets of storage within the system. This includes native files and both inbound and outbound produced documents but also details out how much data is stored in SQL, analytics servers, and search indexes. Something similar happens in other eDiscovery tools.

In some cases, a project native files may be very small (for instance, one has loaded a bunch of source code for review). In this case, this particular type of data which is very text rich, the size of the native files will almost be the same as the size of the data stored in SQL and in the search indexes. However, if we load thousands of pictures taken from a cell phone the size of those native files stored irrelevant of their resolution will be many times larger than the SQL or search index as the textual and metadata content of those documents is very small.

It’s important to understand what’s in the gigabyte that you’re paying for and how that gigabyte is stored and backed up.

How to Save?

Storing data in the most expensive class of storage in an eDiscovery tool may not make the most sense in every case. There are a few ways to save:

Removing the source data for information that has already been processed or loaded for review
Investigate near-line or offline storage

The bulk of the data volume for most projects is the native files and production images (sent or received). In general, these files are numerous but small and can be removed from the online storage when a litigation matter has gone on hiatus.

Offline vs. Online Storage

Storing data offline on hard drives is much more economical than maintaining it with an eDiscovery platform that charges monthly gigabyte fees. To understand more about why, let’s look at the economics from the hosting provider’s perspective. Best practice is to store any information that is online twice, and to maintain backups for certain periods of time.

So that one online gigabyte is stored at least three times in a way that it is easily retrievable, but often many more than that if backups are included. Quick note here about the environmental impact of online storage, which is much higher than offline data since hard drives at rest don’t have the same power, air conditioning, and regular maintenance needs.

Why Does This Matter?

With current market rates for eDiscovery hosting finally approaching the sub $5 range, it may not seem like it’s necessary to find creative solutions for expensive class of service that eDiscovery hosting is. However, offset by data sizes, it has become more relevant.

Thinking forward, it was expected that by now the need to store processed or unprocessed native data in the eDiscovery platform in addition to its original source location would have been solved. Yet the complexities of online storage and cloud providers is relatively high. However, some of these technologies and workflows are being addressed by solutions providers, so look to the industry for some of those changes in the coming year or two.

Conclusion

The importance of maintaining original collected data in an offline state will not change in the future because we will always need to go back to the source at some point or a stage of litigation. Whether for verification, advancements in processing technology, or access by opposing experts. Even if your eDiscovery provider doesn’t allow for offline storage, there are creative workflows that can greatly reduce storage fees billed by these vendors. As we see new innovation in eDiscovery storage, we will be able to greatly reduce the burden of electrical needs in the same way we have become conscious of printing paper. There are similar considerations about keeping unnecessary data online.

Dr. Gavin Manes

CEO at Avansic

Dr. Gavin Manes is a nationally recognized eDiscovery and digital forensics expert. He founded Avansic in 2004 after completing his Doctorate in Computer Science from the University of Tulsa. At Avansic, Dr. Manes is committed to high-technology innovation, research, and mentorship, and has several patents pending. Avansic's scientific approach to eDiscovery and digital forensics stems from his academic experience.

Dr. Manes routinely serves as an expert witness including consulting with attorneys on data preservation issues. He contributes academic content to peer-reviewed journals and delivers classroom lectures. See his full CV at gavinmanes.com.

Dr. Manes has published over fifty papers on eDiscovery, digital forensics, and computer security, countless blog posts, and educational presentations to attorneys, executives, professors, law enforcement, and professional groups on topics from eDiscovery to cyber law. He’s briefed the White House, the Department of the Interior, the National Security Council, and the Pentagon on computer security and forensics issues.

At the University, Dr. Manes formed the Tulsa Digital Forensics Center, housing Cyber Crime Units from local, state, and federal law enforcement agencies. He’s a founder of the University of Tulsa’s Institute for Information Security, leading the creation of nationally recognized research efforts in digital forensics and telecommunications security.