Document page similarity based on layout visual saliency: Application to query by example and document classification

Authors:
Véronique Eglin;Stéphane Bres
Affiliations:
-;-
Venue:
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Year:
2003

Citing 6
Cited 4

Twenty Years of Document Image Analysis in PAMI

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Function of Documents

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Page Segmentation Using Document Model

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Document Image Layout Comparison and Classification

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Zone Classification Using Texture Features

ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276
A flexible image retrieval using explicit visual instruction

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1

Multimedia thumbnails for documents

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
A Visual Technique for Web Pages Comparison

Electronic Notes in Theoretical Computer Science (ENTCS)
Retrieval of document images based on page layout similarity

AMR'06 Proceedings of the 4th international conference on Adaptive multimedia retrieval: user, context, and feedback
Modeling reader's emotional state response on document's typographic elements

Advances in Human-Computer Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose to define a measure of visualsimilarity to compare different pages in a corpus. Thismeasure is based on the analysis of the visual layoutsaliency of the page composition. This similarity iscomputed using both the document layout andcharacteristics of the text itself. The text characterizationuses statistical features derived from textural primitives.Our purpose is to establish perceptive links betweendocuments in order to facilitate their storage and theirretrieval. In this paper we present two possibleapplications of this measure of similarity: the query ofthe corpus by example and the documents classification.In the first application, we extract documents that are themost visually similar to a document, given as query. Inthe second application, the similarity measure is used toclassify the document under investigation using its visualsimilarity to a reference set of documents. Our test corpusis extracted from the Finland MTDB Oulu multi-genredatabase that provides a great diversity of page layoutsand contents.