Web-site boundary detection

  • Authors:
  • Ayesh Alshukri;Frans Coenen;Michele Zito

  • Affiliations:
  • Dept. of Computer Science, The University of Liverpool, Liverpool, UK;Dept. of Computer Science, The University of Liverpool, Liverpool, UK;Dept. of Computer Science, The University of Liverpool, Liverpool, UK

  • Venue:
  • ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to boundary detection is suggested. The principal issue is feature selection, hampered by the observation that there is no clear understanding of what a web-site is. This paper proposes a definition of a web-site, founded on the principle of user intention, directed at the boundary detection problem; and then reports on a sequence of experiments, using a number of clustering techniques, and a wide range of features and combinations of features to identify website boundaries. The preliminary results reported seem to indicate that, in general, a combination of features produces the most appropriate result.