Finding out How Many Red Hots are in the Jellybean Jar, Estimating Prevalence Series Part 1
Random sampling is a powerful eDiscovery tool that can provide you with reliable estimates of the prevalence of relevant materials, missed materials, and more
By Matthew Verga, JD, Xact Data Discovery
A candy store is running a contest. In the front window is a comically enormous jar of jelly beans, all different kinds and colors. Mixed in among them are a secret number of red hot cinnamon candies, similar in size and shape, all red. Whoever can guess closest to the true number of red hots mixed into the jar wins the prize. How do you guess? Do you try to count the red candies you can see, hoping they’re all red hots, and then guess at how many you can’t see? Do you try to count all the candies? Do you try to estimate volumes?
What if you were allowed to take one scoop of candies out of the enormous jar for closer examination, to determine exactly which ones in the scoop were red hots? Could you extrapolate from the scoop to the jar? How much better might your guess be then?