When thinking about the EDRM and how it applies to the cloud in general and Office 365 specifically, a question is commonly raised regarding the processing stage – What does processing mean when dealing with data within the Office 365 environment? Is this something that still applies to the eDiscovery workflow?
Historically the processing step took place once the data was collected from various repositories, such as local email servers, desktops, laptops, and shared file servers. The native data would be run through an application to make the data accessible in a usable manner so that search terms and other culling techniques could be applied. During this stage, the data would typically undergo several procedures to prepare and cull the collection, including text and metadata extraction, as well as deduplication and De-NIST. It would also identify any processing errors and provide reports on the volumes of data and search results. This process can be expensive, with vendors charging based on the volume of data they receive that requires processing and storage.
Office 365 provides a new approach with much of the eDiscovery process able to be done without moving your data outside of Microsoft 365. For example, preservation is able to be done in-place without the need of actually collecting any data or disrupting a custodian’s productivity. This is done by simply placing a legal hold on the users’ data, which will preserve the data even if the custodian tries to delete something. The data within Office 365 is indexed in real-time to make it searchable. This in-place approach allows for the rapid revision of search criteria as a case progresses with minimal cost impact. Additionally, various types of analytics, such as near-duplication and email threading, as well as some review capabilities, are also available within the platform to further limit the data set that would need to be reviewed.
If all these capabilities are available now with the data in-place, does this mean that processing is now obsolete? Actually, Office 365 does not totally eliminate processing, but the need for this expensive step is greatly reduced. While the main goal of processing is to get the data into a state that is searchable, and the vast majority of Office 365 data is already in that state, there is still going to be some data that will need to go through the traditional processing stage.
In the typical corporate environment approximately 97% of the items in Office 365 will be fully searchable using the standard Office 365 eDiscovery search, with 2-3% being considered unindexed (partially searchable or not searchable at all). An example of partially searchable would be an email where the metadata (e.g., sender, recipients, and date), subject line, and most of the body are searchable but there may be a picture embedded within the body that is not searchable. A fully unsearchable item may be due to it being a file type that Office 365 does not recognize, non-Office 365 encryption (if encrypted using Office 365 encryption methods then the content will be searchable), multimedia, such as images, audio, or video, or due to file corruption.
Items that are unindexed will normally need to be exported out of Office 365 and undergo processing to make them searchable. However, as in most cases, the metadata is searchable and queries that do not utilize keywords can typically be applied to help further limit the volume requiring the processing step. Combined with the data that is fully searchable, this helps to further reduce the overall volume, thereby cutting the eDiscovery costs dramatically.
The Advanced eDiscovery solution in Microsoft 365 adds the ability to do a deeper index on many of these items to reduce the volume even more – typically to a few tenths of a percent. This is accomplished by performing OCR on most image files and by expanding support to hundreds of file types that are indexed within the system. Advanced eDiscovery also provides detailed reporting on all of the items that are still not searchable, which in many cases will allow the user to determine if they need to go through an error remediation process or if they can be safely ignored. These additional in-place processes help to keep the data within your security and compliance boundaries for as long as possible and reduce the overall costs of eDiscovery.
While Office 365 does not fully eliminate the need for processing, the costs and time that this stage would normally take in the overall workflow are greatly reduced, providing a defensible approach to eDiscovery that can be supported within the environment of the client. Understanding the Office 365 workflow and how it impacts corporate eDiscovery are essential for any modern eDiscovery professional.