Metadata: the data about the data, the extra goodies, the secret bits that tell you the when, where, and how of a file. Everyone who has used eDiscovery for a while knows some metadata fundamentals. But it never hurts to have a quick review because, as with all things in life, a refresher can do exactly what it sounds like: freshen things up a bit. Plus, as all of us have also experienced, computers tend to interpret things a bit unexpectedly sometimes (hello, blue screen of death!) so it never hurts to get a leg up on the electronic devices in your life.
Common Metadata Fields
Inconveniently for those of us who like things easy, metadata fields are not standard across applications. There are no mandated number of fields, field types, or naming conventions. But in general, these are usually present in some form:
Author
- Identifies the person who created the document
- Useful in determining document origins and the author’s role in the litigation
Date Created
- Indicates the date and time when the document was originally created or the date and time when a file was created on a file system
- Helpful in establishing a timeline
Date Modified
- Indicates the date and time when the document was originally created or the date and time when a file was created on a file system
- Helpful in determining if changes were made after the document was created
Last Modified By
- Identifies the person who last modified the document
Document Title
- Provides a brief description of the document’s content
Document Type
- Identifies the file format of the document, such as PDF, Word, or Excel
- Useful in determining how the document can be accessed and viewed
File Size
- Provides the size of the document tin bytes
- Useful in determining the amount of Storage space required for the document and the potential for it to be transmitted electronically
Metadata fields are dictated by the creator of the application and include whatever data they find useful; digital camera data may contain fields regarding aperture settings, color saturation, scaling, zoom level, and GPS location. These would not be present in the metadata for a PDF document.
Combining Metadata
So you might see where we’re going with this – if different documents have different fields, how can I see the information I want to in any given data set? Not going to lie, merging metadata from different sources can be tricky.
Almost every eDiscovery case pulls documents from many different programs, so standardizing the field names is the first critical step. For example, CreateDate, Create_Date, Created-Date, CRDATE all contain information about when a document was created, but comparing all these columns would be a pain. Better to have it all in one place and that’s where field mapping comes in. This means telling your eDiscovery software that a particular field should be directed to another one. In most modern tools field mapping is automated (sometimes using AI) to detect what field types are, what they should be mapped towards, and whether they are valid. For tools that don’t have this automation, eDiscovery experience is necessary to get the right information in the right place.
Metadata as Trickster
There are a few places where a value in a metadata field may seem like common sense – but this is data after all, here to confuse us. The most commonly misunderstood fields are Last Accessed, Last Modified Date, and Creation Dates.
Creation date is a misleading term since the human definition and a computer’s definition are very different. Most assume that it represents the first time the file was made, wherever that may have been. However, a Microsoft Windows computer defines creation date as the time that a change occurred to the file’s system metadata. If you copy a file from another digital device (like a thumb drive), the creation date of the file on your system will be the date that you downloaded it.
Note that there are different types of creation dates. For instance, a Microsoft Word file will have at least two creation dates: one is the file system creation date and the other is the internal creation date related to when the Word document was made.
The modified date represents the last time that new data was saved to the file. Therefore, if you open an Excel document, make changes, and save it, the modified date will be changed. Some programs may prompt you to save even if you haven’t made any obvious changes; Microsoft Word is a good example. If you open a document, print it, make no alterations to the text, and then save, the modified date will change. Here’s an example of situations where each type of field would be altered:
Word Document | Created | Modified |
Open and save (whether or not changes are made) | X | |
Open and don’t save | ||
Change filename without opening | ||
Download and save an email attachment | X | X |
Copy file to another location | X |
Conclusion
Along with a document’s content, metadata is incredibly important in understanding what’s really going on with documents in a data set. Carefully considered and used well, it can help gain a deeper understanding of what’s relevant in both eDiscovery and digital forensics.