Tools for the efficient generation of hand-drawn corpora based on context-free grammars

  • Authors:
  • Scott MacLean;David Tausky;George Labahn;Edward Lank;Mirette Marzouk

  • Affiliations:
  • University of Waterloo;University of Waterloo;University of Waterloo;University of Waterloo;University of Waterloo

  • Venue:
  • Proceedings of the 6th Eurographics Symposium on Sketch-Based Interfaces and Modeling
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In sketch recognition systems, ground-truth data sets serve to both train and test recognition algorithms. Unfortunately, generating data sets that are sufficiently large and varied is frequently a costly and time-consuming endeavour. In this paper, we present a novel technique for creating a large and varied ground-truthed corpus for hand drawn math recognition. Candidate math expressions for the corpus are generated via random walks through a context-free grammar, the expressions are transcribed by human writers, and an algorithm automatically generates ground-truth data for individual symbols and inter-symbol relationships within the math expressions. While the techniques we develop in this paper are illustrated through the creation of a ground-truthed corpus of mathematical expressions, they are applicable to any sketching domain that can be described by a formal grammar.