Grammar-based techniques for creating ground-truthed sketch corpora

  • Authors:
  • Scott MacLean;George Labahn;Edward Lank;Mirette Marzouk;David Tausky

  • Affiliations:
  • University of Waterloo, David R. Cheriton School of Computer Science, 200 University Avenue West, N2L 3G1, Waterloo, ON, Canada;University of Waterloo, David R. Cheriton School of Computer Science, 200 University Avenue West, N2L 3G1, Waterloo, ON, Canada;University of Waterloo, David R. Cheriton School of Computer Science, 200 University Avenue West, N2L 3G1, Waterloo, ON, Canada;University of Waterloo, David R. Cheriton School of Computer Science, 200 University Avenue West, N2L 3G1, Waterloo, ON, Canada;University of Waterloo, David R. Cheriton School of Computer Science, 200 University Avenue West, N2L 3G1, Waterloo, ON, Canada

  • Venue:
  • International Journal on Document Analysis and Recognition - Special Issue on Performance Evaluation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although publicly available, ground-truthed corpora have proven useful for training, evaluating, and comparing recognition systems in many domains, the availability of such corpora for sketch recognizers, and math recognizers in particular, is currently quite poor. This paper presents a general approach to creating large, ground-truthed corpora for structured sketch domains such as mathematics. In the approach, random sketch templates are generated automatically using a grammar model of the sketch domain. These templates are transcribed manually, then automatically annotated with ground-truth. The annotation procedure uses the generated sketch templates to find a matching between transcribed and generated symbols. A large, ground-truthed corpus of handwritten mathematical expressions presented in the paper illustrates the utility of the approach.