Quality control mechanisms for crowdsourcing: peer review, arbitration, & expertise at familysearch indexing

Authors:
Derek L. Hansen;Patrick J. Schone;Douglas Corey;Matthew Reid;Jake Gehring
Affiliations:
Brigham Young University, Provo, Utah, USA;FamilySearch, Salt Lake City, Utah, USA;Brigham Young University, Provo, Utah, USA;Brigham Young University, Provo, Utah, USA;FamilySearch, Salt Lake City, Utah, USA
Venue:
Proceedings of the 2013 conference on Computer supported cooperative work
Year:
2013

Citing 17
Cited 4

Mixed Models: Theory and Applications (Wiley Series in Probability and Statistics)

Mixed Models: Theory and Applications (Wiley Series in Probability and Statistics)
How oversight improves member-maintained communities

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Games with a Purpose

Computer
Computer Assisted Transcription of Handwritten Text Images

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Information quality work organization in wikipedia

Journal of the American Society for Information Science and Technology
Strong regularities in online peer production

Proceedings of the 9th ACM conference on Electronic commerce
Harnessing the wisdom of crowds in wikipedia: quality through coordination

Proceedings of the 2008 ACM conference on Computer supported cooperative work
Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business

Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business
Crowds and Communities: Light and Heavyweight Models of Peer Production

HICSS '09 Proceedings of the 42nd Hawaii International Conference on System Sciences
Data quality from crowdsourcing: a study of annotation selection criteria

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation

Proceedings of the international conference on Multimedia information retrieval
Soylent: a word processor with a crowd inside

UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
Crowdsourcing systems on the World-Wide Web

Communications of the ACM
Human computation: a survey and taxonomy of a growing field

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Reducing the need for double annotation

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing

Historical Document Imaging and Processing
Building Successful Online Communities: Evidence-Based Social Design

Building Successful Online Communities: Evidence-Based Social Design

Competing or aiming to be average?: normification as a means of engaging digital volunteers

Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
VidWiki: enabling the crowd to improve the legibility of online educational videos

Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Teammate inaccuracy blindness: when information sharing tools hinder collaborative analysis

Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
Family matters: control and conflict in online family history production

Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The FamilySearch Indexing project has enabled hundreds of thousands of volunteers to transcribe billions of records, making it one of the largest crowdsourcing initiatives in the world. Assuring high quality transcriptions (i.e., indexes) with a reasonable amount of volunteer effort is essential to keep pace with the mounds of newly digitized documents. Using historical data, we show the relationship between prior experience and native language on transcriber agreement. We then present a field experiment comparing the effectiveness (accuracy) and efficiency (time) of two quality control mechanisms: (1) Arbitration -- the existing mechanism wherein two volunteers independently transcribe records and disagreements go to an arbitrator, and (2) Peer Review -- a mechanism wherein one volunteer's work is reviewed by another volunteer. Peer Review is significantly more efficient, though not as effective for certain fields as Arbitration. Design suggestions for FamilySearch Indexing and related crowdsourcing initiatives are provided.