Using tree-grammars for training set expansion in page classification

  • Authors:
  • Stefano Baldi;Simone Marinai;Giovanni Soda

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper we describe a method for the expansionof training sets made by XY trees representing page layout.This approach is appropriate when dealing with page classificationbased on MXY tree page representations. The basicidea is the use of tree grammars to model the variationsin the tree which are caused by segmentation algorithms.A set of general grammatical rules are defined and used toexpand the training set. Pages are classified with a k - nnapproach where the distance between pages is computed bymeans of tree-edit distance.