Active learning strategies for handwritten text transcription

Authors:
Nicolás Serrano;Adrià Giménez;Albert Sanchis;Alfons Juan
Affiliations:
Universitat Politècnica de València (UPV) Camí de Vera, València, Spain;Universitat Politècnica de València (UPV) Camí de Vera, València, Spain;Universitat Politècnica de València (UPV) Camí de Vera, València, Spain;Universitat Politècnica de València (UPV) Camí de Vera, València, Spain
Venue:
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Year:
2010

Citing 5
Cited 3

Computer Assisted Transcription of Handwritten Text Images

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Interactive information extraction with constrained conditional random fields

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
The GERMANA Database

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Adaptation from partially supervised handwritten text transcriptions

Proceedings of the 2009 international conference on Multimodal interfaces
Balancing error and supervision effort in interactive-predictive handwriting recognition

Proceedings of the 15th international conference on Intelligent user interfaces

Language identification for interactive handwriting transcription of multilingual documents

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Transcribing handwritten text images with a word soup game

CHI '12 Extended Abstracts on Human Factors in Computing Systems
Effective balancing error and user effort in interactive handwriting recognition

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Active learning strategies are being increasingly used in a variety of real-world tasks, though their application to handwritten text transcription in old manuscripts remains nearly unexplored. The basic idea is to follow a sequential, line-byline transcription of the whole manuscript in which a continuously retrained system interacts with the user to efficiently transcribe each new line. This approach has been recently explored using a conventional strategy by which the user is only asked to supervise words that are not recognized with high confidence. In this paper, the conventional strategy is improved by also letting the system to recompute most probable hypotheses with the constraints imposed by user supervisions. In particular, two strategies are studied which differ in the frequency of hypothesis recomputation on the current line: after each (iterative) or all (delayed) user corrections. Empirical results are reported on two real tasks showing that these strategies outperform the conventional approach.