Using a boosted tree classifier for text segmentation in hand-annotated documents
Pattern Recognition Letters
Hi-index | 0.00 |
In this paper, we propose a tree-structured multi-class classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured classifier is a binary weak learner. Unlike normal decision tree(DT) which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all training data at each node with different weights. The evaluation of the proposed method is presented on a set of machine printed documents which have been annotated by multiple writers in an office/collaborative environment.