metadata data about data

Metadata Remixed

Share this article

Metadata: the data about the data, the extra goodies, the secret bits that tell you the when, where, and how of a file. Everyone who has used eDiscovery for a while knows some metadata fundamentals. But it never hurts to have a quick review because, as with all things in life, a refresher can do exactly what it sounds like: freshen things up a bit. Plus, as all of us have also experienced, computers tend to interpret things a bit unexpectedly sometimes (hello, blue screen of death!) so it never hurts to get a leg up on the electronic devices in your life. 

Common Metadata Fields 

Inconveniently for those of us who like things easy, metadata fields are not standard across applications. There are no mandated number of fields, field types, or naming conventions. But in general, these are usually present in some form:   

Author
  • Identifies the person who created the document
  • Useful in determining document origins and the author’s role in the litigation
Date Created
  • Indicates the date and time when the document was originally created or the date and time when a file was created on a file system
  • Helpful in establishing a timeline
Date Modified
  • Indicates the date and time when the document was originally created or the date and time when a file was created on a file system
  • Helpful in determining if changes were made after the document was created
Last Modified By
  • Identifies the person who last modified the document
Document Title
  • Provides a brief description of the document’s content
Document Type
  • Identifies the file format of the document, such as PDF, Word, or Excel
  • Useful in determining how the document can be accessed and viewed
File Size
  • Provides the size of the document tin bytes
  • Useful in determining the amount of Storage space required for the document and the potential for it to be transmitted electronically

Metadata fields are dictated by the creator of the application and include whatever data they find useful; digital camera data may contain fields regarding aperture settings, color saturation, scaling, zoom level, and GPS location. These would not be present in the metadata for a PDF document. 

Combining Metadata 

So you might see where we’re going with this – if different documents have different fields, how can I see the information I want to in any given data set? Not going to lie, merging metadata from different sources can be tricky.  

Almost every eDiscovery case pulls documents from many different programs, so standardizing the field names is the first critical step. For example, CreateDate, Create_Date, Created-Date, CRDATE all contain information about when a document was created, but comparing all these columns would be a pain. Better to have it all in one place and that’s where field mapping comes in. This means telling your eDiscovery software that a particular field should be directed to another one. In most modern tools field mapping is automated (sometimes using AI) to detect what field types are, what they should be mapped towards, and whether they are valid. For tools that don’t have this automation, eDiscovery experience is necessary to get the right information in the right place.  

Metadata as Trickster 

There are a few places where a value in a metadata field may seem like common sense – but this is data after all, here to confuse us. The most commonly misunderstood fields are Last Accessed, Last Modified Date, and Creation Dates.   

Creation date is a misleading term since the human definition and a computer’s definition are very different. Most assume that it represents the first time the file was made, wherever that may have been. However, a Microsoft Windows computer defines creation date as the time that a change occurred to the file’s system metadata. If you copy a file from another digital device (like a thumb drive), the creation date of the file on your system will be the date that you downloaded it.  

Note that there are different types of creation dates. For instance, a Microsoft Word file will have at least two creation dates: one is the file system creation date and the other is the internal creation date related to when the Word document was made. 

The modified date represents the last time that new data was saved to the file. Therefore, if you open an Excel document, make changes, and save it, the modified date will be changed. Some programs may prompt you to save even if you haven’t made any obvious changes; Microsoft Word is a good example. If you open a document, print it, make no alterations to the text, and then save, the modified date will change.  Here’s an example of situations where each type of field would be altered:  

Word Document Created Modified 
Open and save (whether or not changes are made)  
Open and don’t save   
Change filename without opening   
Download and save an email attachment 
Copy file to another location  

Conclusion 

Along with a document’s content, metadata is incredibly important in understanding what’s really going on with documents in a data set. Carefully considered and used well, it can help gain a deeper understanding of what’s relevant in both eDiscovery and digital forensics.  

Dr. Gavin Manes on Email
Dr. Gavin Manes
CEO at Avansic
Dr. Gavin Manes is a nationally recognized eDiscovery and digital forensics expert. He founded Avansic in 2004 after completing his Doctorate in Computer Science from the University of Tulsa. At Avansic, Dr. Manes is committed to high-technology innovation, research, and mentorship, and has several patents pending. Avansic's scientific approach to eDiscovery and digital forensics stems from his academic experience.

Dr. Manes routinely serves as an expert witness including consulting with attorneys on data preservation issues. He contributes academic content to peer-reviewed journals and delivers classroom lectures. See his full CV at gavinmanes.com.

Dr. Manes has published over fifty papers on eDiscovery, digital forensics, and computer security, countless blog posts, and educational presentations to attorneys, executives, professors, law enforcement, and professional groups on topics from eDiscovery to cyber law. He’s briefed the White House, the Department of the Interior, the National Security Council, and the Pentagon on computer security and forensics issues.

At the University, Dr. Manes formed the Tulsa Digital Forensics Center, housing Cyber Crime Units from local, state, and federal law enforcement agencies. He’s a founder of the University of Tulsa’s Institute for Information Security, leading the creation of nationally recognized research efforts in digital forensics and telecommunications security.
ACEDS
CEO at Avansic

Share this article