Visual structure-based web page clustering and retrieval

  • Authors:
  • Paul Bohunsky;Wolfgang Gatterbauer

  • Affiliations:
  • Vienna University of Technology, Vienna, Austria;University of Washington, Seattle, WA, USA

  • Venue:
  • Proceedings of the 19th international conference on World wide web
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering and retrieval of web pages dominantly relies on analyzing either the content of individual web pages or the link structure between them. Some literature also suggests to use the structure of web pages, notably the structure of its DOM tree. However, little work considers the visual structure of web pages for clustering. In this paper (i) we motivate visual structure-based web page clustering and retrieval for a number of applications, (ii) we formalize a visual box model-based representation of web pages that supports new metrics of visual similarity, and (iii) we report on our current work on evaluating human perception of visual similarity of web pages and applying the learned visual similarity features to web page clustering and retrieval.