Segmentation of page images using the area Voronoi diagram
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Empirical Performance Evaluation of Graphics Recognition Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Pattern Analysis and Machine Intelligence
Knowledge-based English cursive script segmentation
Pattern Recognition Letters
Use of the Hough transformation to detect lines and curves in pictures
Communications of the ACM
The Document Spectrum for Page Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Touching numeral segmentation using water reservoir concept
Pattern Recognition Letters
Two Geometric Algorithms for Layout Analysis
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
A Hough based algorithm for extracting text lines in handwritten documents
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
An Algorithm for Extracting Cursive Text Lines
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
An Approach to Word Image Matching Based on Weighted Hausforff Distance
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Line Detection and Segmentation in Historical Church Registers
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
User-Assisted Archive Document Image Analysis for Digital Library Construction
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Line Separation for Complex Document Images Using Fuzzy Runlength
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
A Scale Space Approach for Automatically Segmenting Words from Historical Handwritten Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Semantics-Based Content Extraction in Typewritten Historical Documents
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Separating Lines of Text in Free-Form Handwritten Historical Documents
DIAL '06 Proceedings of the Second International Conference on Document Image Analysis for Libraries
Text Line Extraction in Handwritten Document with Kalman Filter Applied on Low Resolution Image
DIAL '06 Proceedings of the Second International Conference on Document Image Analysis for Libraries
Detecting Text Lines in Handwritten Documents
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Adaptive degraded document image binarization
Pattern Recognition
Text line segmentation of historical documents: a survey
International Journal on Document Analysis and Recognition
User-driven page layout analysis of historical printed books
International Journal on Document Analysis and Recognition
Keyword-guided word spotting in historical printed documents using synthetic data and user feedback
International Journal on Document Analysis and Recognition
Page frame detection for double page document images
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Text detection in images using sparse representation with discriminative dictionaries
Image and Vision Computing
Text extraction using component analysis and neuro-fuzzy classification on complex backgrounds
SCIA'11 Proceedings of the 17th Scandinavian conference on Image analysis
An experimental workflow development platform for historical document digitisation and analysis
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Using adaptive run length smoothing algorithm for accurate text localization in images
CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
An information extraction system from patient historical documents
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Margin noise removal from printed document images
Proceeding of the workshop on Document Analysis and Recognition
An optimization for binarization methods by removing binary artifacts
Pattern Recognition Letters
Journal on Computing and Cultural Heritage (JOCCH)
Hi-index | 0.00 |
In this paper, we strive towards the development of efficient techniques in order to segment document pages resulting from the digitization of historical machine-printed sources. This kind of documents often suffer from low quality and local skew, several degradations due to the old printing matrix quality or ink diffusion, and exhibit complex and dense layout. To face these problems, we introduce the following innovative aspects: (i) use of a novel Adaptive Run Length Smoothing Algorithm (ARLSA) in order to face the problem of complex and dense document layout, (ii) detection of noisy areas and punctuation marks that are usual in historical machine-printed documents, (iii) detection of possible obstacles formed from background areas in order to separate neighboring text columns or text lines, and (iv) use of skeleton segmentation paths in order to isolate possible connected characters. Comparative experiments using several historical machine-printed documents prove the efficiency of the proposed technique.