Learning to discriminate text from synthetic data

  • Authors:
  • José Antonio Álvarez Ruiz

  • Affiliations:
  • Bonn-Rhine-Sieg University of Applied Sciences, Computer Science Department, Augustin, Germany

  • Venue:
  • Robot Soccer World Cup XV
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Service robots could use textual information to perform important tasks, like product identification. However, natural scene text such as found in household environments can be very arbitrary in terms of size, color, font, layout, symbol repertoire, language, etc. This large variability makes robust text information extraction extremely difficult. Our work on textual information extraction for gray-scale still images uses adaptive binarization, connected component classification with a support vector machine and filtering based on the proximity of the connected components to their neighbours. The contribution of our approach is the use of a partially synthetic dataset for training. This decreases the burden of ground truth labelling at the connected component level. Our experiments show that classification generalization on real instances can be attained when training a classifier with synthetic data. We present our results on the ICDAR dataset.