On the boosting ability of top-down decision tree learning algorithms
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Segmentation of page images using the area Voronoi diagram
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
The Document Spectrum for Page Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Characterizing and Distinguishing Text in Bank Cheque Images
SIBGRAPI '02 Proceedings of the 15th Brazilian Symposium on Computer Graphics and Image Processing
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Separating Handwritten Material from Machine Printed Text Using Hidden Markov Models
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Machine Printed Text and Handwriting Identification in Noisy Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 2 - Volume 02
AdaTree: Boosting a Weak Classifier into a Decision Tree
CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 6 - Volume 06
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Automatic name extraction from degraded document images
Pattern Analysis & Applications
Identifying Handwritten Text in Mixed Documents
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Boosting for Learning Multiple Classes with Imbalanced Class Distribution
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A Hybrid Re-sampling Method for SVM Learning from Imbalanced Data Sets
FSKD '08 Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 02
Separation of Overlapping and Touching Lines within Handwritten Arabic Documents
CAIP '09 Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
A brief introduction to boosting
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
A Hierarchical Classification Model for Document Categorization
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Voronoi++: A Dynamic Page Segmentation Approach Based on Voronoi and Docstrum Features
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Markov Random Field Based Text Identification from Annotated Machine Printed Documents
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Overlapped text segmentation using Markov random field and aggregation
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Boosting support vector machines for imbalanced data sets
Knowledge and Information Systems
Text Separation from Mixed Documents Using a Tree-Structured Classifier
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Comparison of texture features based on Gabor filters
IEEE Transactions on Image Processing
An optimization for binarization methods by removing binary artifacts
Pattern Recognition Letters
Hi-index | 0.10 |
A boosted tree classifier is proposed to segment machine printed, handwritten and overlapping text from documents with handwritten annotations. Each node of the tree-structured classifier is a binary weak learner. Unlike a standard decision tree (DT) which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all available training data at each node with different weights. The proposed method is evaluated on a set of machine-printed documents which have been annotated by multiple writers in an office/collaborative environment. The experimental results show that the proposed algorithm outperforms other methods on an imbalanced data set.