Key Sampling Concepts for Winning the Candy Contest, Estimating Prevalence Series Part 2
Random sampling is a powerful eDiscovery tool that can provide you with reliable estimates of the prevalence of relevant materials, missed materials, and more
by Matthew Verga, JD, Xact Data Discovery
In “Finding out How Many Red Hots are in the Jellybean Jar,” we discussed a candy contest hypothetical and the importance of sampling techniques to eDiscovery. In this Part, we review the key sampling concepts necessary to use sampling to estimate prevalence.
In order to use sampling to estimate how many red hots are mixed into the jellybean jar, we need to understand some basic sampling concepts, including: sampling frame, prevalence, confidence level, and confidence interval, as well as how each affects required sample size. We also need to understand that whenever we refer to sampling here, we are referring to “simple random sampling” in which any item within the sampling frame has an equal chance of being randomly selected for inclusion in the sample.