An annotation assistance system using an unsupervised codebook composed of handwritten graphical multi-stroke symbols

  • Authors:
  • Jinpeng Li;Harold Mouchère;Christian Viard-Gaudin

  • Affiliations:
  • -;-;-

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2014

Quantified Score

Hi-index 0.10

Visualization

Abstract

Many present recognition systems take advantage of ground-truthed datasets for training, evaluating and testing. But the creation of ground-truthed datasets is a tedious task. This paper proposes an iterative unsupervised handwritten graphical symbols learning framework which can be used for assisting such a labeling task. Initializing each stroke as a segment, we construct a relational graph between the segments where the nodes are the segments and the edges are the spatial relations between them. To extract the relevant patterns, a quantization of segments and spatial relations is implemented. Discovering graphical symbols becomes then the problem of finding the sub-graphs according to the Minimum Description Length (MDL) principle. The discovered graphical symbols will become the new segments for the next iteration. In each iteration, the quantization of segments yields the codebook in which the user can label graphical symbols. This original method has been first applied on a dataset of simple mathematical expressions. The results reported in this work show that only 58.2% of the strokes have to be manually labeled.