Indexing by permeability in block structured web pages

  • Authors:
  • Emmanuel Bruno;Nicolas Faessel;Hervé Glotin;Jacques Le Maitre;Michel Scholl

  • Affiliations:
  • Univsersité du Sud Toulon-Var, La Garde, France;Université Paul Cézanne, Marseille, France;Univsersité du Sud Toulon-Var, La Garde, France;Univsersité du Sud Toulon-Var, La Garde, France;CNAM, Paris, France

  • Venue:
  • Proceedings of the 9th ACM symposium on Document engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present in this paper a model that we have developed for indexing and querying web pages based on their visual rendering. In this model pages are split up into a set of visual blocks. The indexing of a block takes into account its content, its visual importance and, by permeability, the indexing of neighbors blocks. A page is modeled as a directed acyclic graph. Each node is associated with a block and labeled by the coefficient of importance of this block. Each edge is labeled by the coefficient of permeability of the target node content to the source node content. Importance and permeability coefficients cannot be manually quantified. the second part of this paper, we present an experiment consisting in learning optimal permeability coefficients by gradient descent for indexing images of a web page from the text blocks of this page. The dataset is drawn from real web pages of the train and test set of the ImagEval task2 corpus. Results demonstrate an improvement of the indexing using non uniform block permeabilities.