Automated Evaluation of OCR Zoning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document image analysis
Run-Based Algorithms for Binary Image Analysis and Processing
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Texture Discrimination Masks
IEEE Transactions on Pattern Analysis and Machine Intelligence
A document recognition system and its applications
IBM Journal of Research and Development
A Generic System for Form Dropout
IEEE Transactions on Pattern Analysis and Machine Intelligence
Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals
IEEE Transactions on Pattern Analysis and Machine Intelligence
The Document Spectrum for Page Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Twenty Years of Document Image Analysis in PAMI
IEEE Transactions on Pattern Analysis and Machine Intelligence
Geometric Structure Analysis of Document Images: A Knowledge-Based Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
An Optimization Methodology for Document Structure Extraction on Latin Character Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Parameter-Free Geometric Document Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Image Analysis for Digital Media Applications
IEEE Computer Graphics and Applications
Robust watermarking of cartographic images
EURASIP Journal on Applied Signal Processing - Emerging applications of multimedia data hiding
Feature Approach for Printed Document Image Analysis
Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Automatic Feature Extraction and Recognition for Digital Access of Books of the Renaissance
ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
Text Identification in Noisy Document Images Using Markov Random Field
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Compression of scan-digitized Indian language printed text: a soft pattern matching technique
Proceedings of the 2003 ACM symposium on Document engineering
Machine Printed Text and Handwriting Identification in Noisy Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Text Location in Images and Video Frames
ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
Dynamic local connectivity and its application to page segmentation
Proceedings of the 1st ACM workshop on Hardcopy document processing
Artificial Neural Networks for Document Analysis and Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-scale Techniques for Document Page Segmentation
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Multi-Level Component Grouping Algorithm and Its Applications
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Combining DOM tree and geometric layout analysis for online medical journal article segmentation
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Text block geometric shape analysis
Proceedings of the 2006 ACM symposium on Document engineering
Document zone content classification and its performance evaluation
Pattern Recognition
Robust watermarking of cartographic images
EURASIP Journal on Applied Signal Processing
Document page segmentation using neuro-fuzzy approach
Applied Soft Computing
Text line segmentation in handwritten documents using Mumford-Shah model
Pattern Recognition
Automatic extraction of data points and text blocks from 2-dimensional plots in digital documents
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Composition of a dewarped and enhanced document image from two view images
IEEE Transactions on Image Processing
AI*IA '09: Proceedings of the XIth International Conference of the Italian Association for Artificial Intelligence Reggio Emilia on Emergent Perspectives in Artificial Intelligence
Retrieval of document images based on page layout similarity
AMR'06 Proceedings of the 4th international conference on Adaptive multimedia retrieval: user, context, and feedback
Text area detection in digital documents images using textural features
CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Construction of isothetic covers of a digital object: A combinatorial approach
Journal of Visual Communication and Image Representation
XML based architectures for documents comparison, categorisation, and scrutinisation
International Journal of Data Analysis Techniques and Strategies
A restoration and segmentation unit for the historic persian documents
ACIVS'05 Proceedings of the 7th international conference on Advanced Concepts for Intelligent Vision Systems
Neuro-fuzzy analysis of document images by the KERNEL system
WILF'05 Proceedings of the 6th international conference on Fuzzy Logic and Applications
CONTENTUS--technologies for next generation multimedia libraries
Multimedia Tools and Applications
Automatic comic page segmentation based on polygon detection
Multimedia Tools and Applications
Hi-index | 0.14 |
Transforming a paper document to its electronic version in a form suitable for efficient storage, retrieval, and interpretation continues to be a challenging problem. An efficient representation scheme for document images is necessary to solve this problem. Document representation involves techniques of thresholding, skew detection, geometric layout analysis, and logical layout analysis. The derived representation can then be used in document storage and retrieval. Page segmentation is an important stage in representing document images obtained by scanning journal pages. The performance of a document understanding system greatly depends on the correctness of page segmentation and labeling of different regions such as text, tables, images, drawings, and rulers. In this paper, we use the traditional bottom-up approach based on the connected component extraction to efficiently implement page segmentation and region identification. A new document model which preserves top-down generation information is proposed based on which a document is logically represented for interactive editing, storage, retrieval, transfer, and logical analysis. Our algorithm has a high accuracy and takes approximately 1.4 seconds on a SGI Indy workstation for model creation, including orientation estimation, segmentation, and labeling (text, table, image, drawing, and ruler) for a 2,550 脳 3,300 image of a typical journal page scanned at 300 dpi. This method is applicable to documents from various technical journals and can accommodate moderate amounts of skew and noise.