Document categorization in legal electronic discovery: computer classification vs. manual review

Authors:
Herbert L. Roitblat;Anne Kershaw;Patrick Oot
Affiliations:
Electronic Discovery Institute, OrcaTec LLC, PO Box 613, Ojai, CA 93024;Electronic Discovery Institute, A. Kershaw, P.C. Attorneys & Consultants, 303 South Broadway, Suite 430, Tarrytown, NY 10591;Electronic Discovery Institute, Verizon, 1320 North Courthouse Road, Arlington, VA 22201
Venue:
Journal of the American Society for Information Science and Technology
Year:
2010

Citing 3
Cited 8

An evaluation of retrieval effectiveness for a full-text document-retrieval system

Communications of the ACM
Variations in relevance judgments and the measurement of retrieval effectiveness

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval

Assessor error in stratified evaluation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
E-discovery revisited: the need for artificial intelligence beyond information retrieval

Artificial Intelligence and Law
Evaluation of information retrieval for E-discovery

Artificial Intelligence and Law
Automation of legal sensemaking in e-discovery

Artificial Intelligence and Law
A new tangible user interface for machine learning document review

Artificial Intelligence and Law
Effect of written instructions on assessor agreement

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Alternative assessor disagreement and retrieval depth

Proceedings of the 21st ACM international conference on Information and knowledge management
Approximate Recall Confidence Intervals

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.01

Visualization

Abstract

In litigation in the US, the parties are obligated to produce to one another, when requested, those documents that are potentially relevant to issues and facts of the litigation (called “discovery”). As the volume of electronic documents continues to grow, the expense of dealing with this obligation threatens to surpass the amounts at issue and the time to identify these relevant documents can delay a case for months or years. The same holds true for government investigations and third-parties served with subpoenas. As a result, litigants are looking for ways to reduce the time and expense of discovery. One approach is to supplant or reduce the traditional means of having people, usually attorneys, read each document, with automated procedures that use information retrieval and machine categorization to identify the relevant documents. This study compared an original categorization, obtained as part of a response to a Department of Justice Request and produced by having one or more of 225 attorneys review each document with automated categorization systems provided by two legal service providers. The goal was to determine whether the automated systems could categorize documents at least as well as human reviewers could, thereby saving time and expense. The results support the idea that machine categorization is no less accurate at identifying relevant-responsive documents than employing a team of reviewers. Based on these results, it would appear that using machine categorization can be a reasonable substitute for human review. © 2010 Wiley Periodicals, Inc.