Crowdsourcing in the document processing practice

  • Authors:
  • Ehud D. Karnin;Eugene Walach;Tal Drory

  • Affiliations:
  • IBM Research Haifa Lab, Haifa, Israel;IBM Research Haifa Lab, Haifa, Israel;IBM Research Haifa Lab, Haifa, Israel

  • Venue:
  • ICWE'10 Proceedings of the 10th international conference on Current trends in web engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The processing of scanned documents calls for automatic recognition of the text by OCR (Optical Character Recognition) computer programs, followed by human validation and correction. Crowdsourcing of these essential manual tasks is a good option, provided one can take care of some key challenges, so that the quality level expected by the customer is met. We show how tools for efficient validation and correction are adapted and enhanced to address issues associated with crowdsourcing, such as data privacy, quality control, crowd monitoring, and job quality assurance. We started to implement these ideas and technologies in our COoperative eNgine for Correction of ExtRacted Text (CONCERT), which is used in book digitization projects.