Picture detection in document page images

Authors:
Patrick Chiu;Francine Chen;Laurent Denoue
Affiliations:
FX Palo Alto Laboratory, Palo Alto, CA, USA;FX Palo Alto Laboratory, Palo Alto, CA, USA;FX Palo Alto Laboratory, Palo Alto, CA, USA
Venue:
Proceedings of the 10th ACM symposium on Document engineering
Year:
2010

Citing 7
Cited 0

A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Performance Evaluation for Video Text Detection

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
UpLib: a universal personal digital library system

Proceedings of the 2003 ACM symposium on Document engineering
Machine Learning for Multimedia Content Analysis (Multimedia Systems and Applications)

Machine Learning for Multimedia Content Analysis (Multimedia Systems and Applications)
Performance Analysis Framework for Layout Analysis Methods

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Performance Evaluation and Benchmarking of Six-Page Segmentation Algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for picture detection in document page images, which can come from scanned or camera images, or rendered from electronic file formats. Our method uses OCR to separate out the text and applies the Normalized Cuts algorithm to cluster the non-text pixels into picture regions. A refinement step uses the captions found in the OCR text to deduce how many pictures are in a picture region, thereby correcting for under- and over-segmentation. A performance evaluation scheme is applied which takes into account the detection quality and fragmentation quality. We benchmark our method against the ABBYY application on page images from conference papers.