Using predictive prefetching to improve World Wide Web latency
ACM SIGCOMM Computer Communication Review
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Data mining: concepts and techniques
Data mining: concepts and techniques
Web page change and persistence---a four-year longitudinal study
Journal of the American Society for Information Science and Technology
Data Mining: Introductory and Advanced Topics
Data Mining: Introductory and Advanced Topics
A First Experience in Archiving the French Web
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Who Links to Whom: Mining Linkage between Web Sites
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Efficient URL caching for world wide web crawling
WWW '03 Proceedings of the 12th international conference on World Wide Web
Stochastic models for the Web graph
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Web Searching and Information Retrieval
Computing in Science and Engineering
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Modelling information persistence on the web
ICWE '06 Proceedings of the 6th international conference on Web engineering
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
Exploring the bounds of web latency reduction from caching and prefetching
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Detection of Web Subsites: Concepts, Algorithms, and Evaluation Issues
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
As we may perceive: finding the boundaries of compound documents on the web
Proceedings of the 17th international conference on World Wide Web
Random walks, universal traversal sequences, and the complexity of maze problems
SFCS '79 Proceedings of the 20th Annual Symposium on Foundations of Computer Science
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Identifying websites with flow simulation
ICWE'05 Proceedings of the 5th international conference on Web Engineering
Mining groups of common interest: discovering topical communities with network flows
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Hi-index | 0.00 |
The paper describes variations of the classical k-means clustering algorithm that can be used effectively to address the so called Web-site Boundary Detection (WBD) problem. The suggested advantages offered by these techniques are that they can quickly identify most of the pages belonging to a web-site; and, in the long run, return a solution of comparable (if not better) accuracy than other clustering methods. We analyze our techniques on artificial clones of the web generated using a well-known preferential attachment method.