Show Your Work: Contingency Tables and Error Margins, Testing Classifiers Series Part 2
Random sampling is a powerful eDiscovery tool that can provide you with reliable measurements of the efficacy and efficiency of searches, reviewers, or other classifiers
by Matthew Verga, JD, Xact Data Discovery
In “Pop Quiz: How Do You Test a Search?,” we discussed the application of sampling techniques to testing classifiers and introduced the concepts of recall and precision. In this Part, we apply those concepts to testing a hypothetical search classifier.
As we discussed in the last part, sampling can be used to test search classifiers by calculating their recall and precision. This process replaces informal sampling and instinctual assessments with actual data on efficacy and efficiency, for both your own planning and your discovery process negotiations with other parties. Doing so requires creating a control set using simple random sampling and then reviewing that control set to identify in advance the items you hope the search classifier will find. Once you have a reviewed control set, you are ready to run your search classifiers against it for testing, whether those are keyword searches, TAR software, or another search classifier.