Text Separation from Mixed Documents Using a Tree-Structured Classifier

  • Authors:
  • Xujun Peng;Srirangaraj Setlur;Venu Govindaraju;Ramachandrula Sitaram

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a tree-structured multi-class classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured classifier is a binary weak learner. Unlike normal decision tree(DT) which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all training data at each node with different weights. The evaluation of the proposed method is presented on a set of machine printed documents which have been annotated by multiple writers in an office/collaborative environment.