Page segmentation and classification
CVGIP: Graphical Models and Image Processing
The nature of statistical learning theory
The nature of statistical learning theory
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
A Fast Algorithm for Bottom-Up Document Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Parameter-Free Geometric Document Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Retrieval
Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals
IEEE Transactions on Pattern Analysis and Machine Intelligence
The Document Spectrum for Page Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration
IEEE Transactions on Pattern Analysis and Machine Intelligence
A New Component Based Algorithm for Newspaper Layout Analysis
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Variable selection using svm based criteria
The Journal of Machine Learning Research
A linear-time component-labeling algorithm using contour tracing technique
Computer Vision and Image Understanding
Artificial Neural Networks for Document Analysis and Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Statistical Learning Approach To Document Image Analysis
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Markov logic networks for document layout correction
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Parameter-free based two-stage method for binarizing degraded document images
Pattern Recognition
Hi-index | 0.01 |
The purpose of document layout analysis is to locate textlines and text regions in document images mostly via a series of split-or-merge operations. Before applying such an operation, however, it is necessary to examine the context to decide whether the place chosen for the operation is appropriate. We thus view document layout analysis as a matter of solving a series of binary decision problems, such as whether to apply, or not to apply, a split-or-merge operation to a chosen place. To solve these problems, we use support vector machines to learn whether or not to apply the previously mentioned operations from training documents in which all textlines and text regions have been located and their identifies labeled. The proposed approach is very effective for analyzing documents that allow both horizontal and vertical reading orders. When applied to a test data set composed of eight types of layout structure, the approach's accuracy rates for identifying textlines and text regions are 98.83% and 96.72%, respectively.