Web-site boundary detection

Authors:
Ayesh Alshukri;Frans Coenen;Michele Zito
Affiliations:
Dept. of Computer Science, The University of Liverpool, Liverpool, UK;Dept. of Computer Science, The University of Liverpool, Liverpool, UK;Dept. of Computer Science, The University of Liverpool, Liverpool, UK
Venue:
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Year:
2010

Citing 13
Cited 2

Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Defining logical domains in a web site

HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
Applying the Site Information to the Information Retrieval from the Web

WISE '02 Proceedings of the 3rd International Conference on Web Information Systems Engineering
Machine Learning Approach for Homepage Finding Task

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Who Links to Whom: Mining Linkage between Web Sites

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Finding the boundaries of information resources on the web

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Hierarchical topic segmentation of websites

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Detection of Web Subsites: Concepts, Algorithms, and Evaluation Issues

WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
As we may perceive: finding the boundaries of compound documents on the web

Proceedings of the 17th international conference on World Wide Web
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
Identifying websites with flow simulation

ICWE'05 Proceedings of the 5th international conference on Web Engineering

Incremental web-site boundary detection using random walks

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
MenuMiner: revealing the information architecture of large web sites by analyzing maximal cliques

Proceedings of the 21st international conference companion on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to boundary detection is suggested. The principal issue is feature selection, hampered by the observation that there is no clear understanding of what a web-site is. This paper proposes a definition of a web-site, founded on the principle of user intention, directed at the boundary detection problem; and then reports on a sequence of experiments, using a number of clustering techniques, and a wide range of features and combinations of features to identify website boundaries. The preliminary results reported seem to indicate that, in general, a combination of features produces the most appropriate result.