Identifying websites with flow simulation

Authors:
Pierre Senellart
Affiliations:
École normale supÉrieure, Paris Cedex 05, France
Venue:
ICWE'05 Proceedings of the 5th international conference on Web Engineering
Year:
2005

Citing 5
Cited 2

A new approach to the maximum-flow problem

Journal of the ACM (JACM)
Introduction to algorithms

Introduction to algorithms
Constructing, organizing, and visualizing collections of topically related Web resources

ACM Transactions on Computer-Human Interaction (TOCHI)
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A First Experience in Archiving the French Web

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries

Web-site boundary detection

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Incremental web-site boundary detection using random walks

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present in this paper a method to discover the set of webpages contained in a logical website, based on the link structure of the Web graph. Such a method is useful in the context of Web archiving and website importance computation. To identify the boundaries of a website, we combine the use of an online version of the preflow-push algorithm, an algorithm for the maximum flow problem in traffic networks, and of the Markov CLuster (MCL) algorithm. The latter is used on a crawled portion of the Web graph in order to build a seed of initial webpages, a seed which is extended using the former. An experiment on a subsite of the INRIA Website is described.