Identifying websites with flow simulation

  • Authors:
  • Pierre Senellart

  • Affiliations:
  • École normale supÉrieure, Paris Cedex 05, France

  • Venue:
  • ICWE'05 Proceedings of the 5th international conference on Web Engineering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present in this paper a method to discover the set of webpages contained in a logical website, based on the link structure of the Web graph. Such a method is useful in the context of Web archiving and website importance computation. To identify the boundaries of a website, we combine the use of an online version of the preflow-push algorithm, an algorithm for the maximum flow problem in traffic networks, and of the Markov CLuster (MCL) algorithm. The latter is used on a crawled portion of the Web graph in order to build a seed of initial webpages, a seed which is extended using the former. An experiment on a subsite of the INRIA Website is described.