Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Defining logical domains in a web site
HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Data Mining: Introductory and Advanced Topics
Data Mining: Introductory and Advanced Topics
Applying the Site Information to the Information Retrieval from the Web
WISE '02 Proceedings of the 3rd International Conference on Web Information Systems Engineering
Machine Learning Approach for Homepage Finding Task
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Who Links to Whom: Mining Linkage between Web Sites
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Finding the boundaries of information resources on the web
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Hierarchical topic segmentation of websites
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Detection of Web Subsites: Concepts, Algorithms, and Evaluation Issues
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
As we may perceive: finding the boundaries of compound documents on the web
Proceedings of the 17th international conference on World Wide Web
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data
Identifying websites with flow simulation
ICWE'05 Proceedings of the 5th international conference on Web Engineering
Incremental web-site boundary detection using random walks
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
MenuMiner: revealing the information architecture of large web sites by analyzing maximal cliques
Proceedings of the 21st international conference companion on World Wide Web
Hi-index | 0.00 |
Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to boundary detection is suggested. The principal issue is feature selection, hampered by the observation that there is no clear understanding of what a web-site is. This paper proposes a definition of a web-site, founded on the principle of user intention, directed at the boundary detection problem; and then reports on a sequence of experiments, using a number of clustering techniques, and a wide range of features and combinations of features to identify website boundaries. The preliminary results reported seem to indicate that, in general, a combination of features produces the most appropriate result.