Document page similarity based on layout visual saliency: Application to query by example and document classification

  • Authors:
  • Véronique Eglin;Stéphane Bres

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose to define a measure of visualsimilarity to compare different pages in a corpus. Thismeasure is based on the analysis of the visual layoutsaliency of the page composition. This similarity iscomputed using both the document layout andcharacteristics of the text itself. The text characterizationuses statistical features derived from textural primitives.Our purpose is to establish perceptive links betweendocuments in order to facilitate their storage and theirretrieval. In this paper we present two possibleapplications of this measure of similarity: the query ofthe corpus by example and the documents classification.In the first application, we extract documents that are themost visually similar to a document, given as query. Inthe second application, the similarity measure is used toclassify the document under investigation using its visualsimilarity to a reference set of documents. Our test corpusis extracted from the Finland MTDB Oulu multi-genredatabase that provides a great diversity of page layoutsand contents.