Segmentation of page images using the area Voronoi diagram
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
The Document Spectrum for Page Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Two Geometric Algorithms for Layout Analysis
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Automated Borders Detection and Adaptive Segmentation for Binary Document Images
ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276
Pixel-Accurate Representation and Evaluation of Page Segmentation in Document Images
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
Performance comparison of six algorithms for page segmentation
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Document Signature Using Intrinsic Features for Counterfeit Detection
IWCF '08 Proceedings of the 2nd international workshop on Computational Forensics
ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Hi-index | 0.00 |
We describe and evaluate a method to robustly detect the page frame in document images, locating the actual page contents area and removing textual and non-textual noise along the page borders. We use a geometric matching algorithm to find the optimal page frame, which has the advantages of not assuming the existence of whitespace between noisy borders and actual page contents, and of giving a practical solution to the page frame detection problem without the need for parameter tuning. We define suitable performance measures and evaluate the algorithm on the UW-III database. The results show that the error rates are below 4% for each of the performance measures used. In addition, we demonstrate that the use of page frame detection reduces the optical character recognition (OCR) error rate by removing textual noise. Experiments using a commercial OCR system show that the error rate due to elements outside the page frame is reduced from 4.3% to 1.7% on the UW-III dataset.