Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Evaluating strategies for similarity search on the web
Proceedings of the 11th international conference on World Wide Web
Finding Near-Replicas of Documents and Servers on the Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A preprocessing framework and approach for web applications
Journal of Web Engineering
Hi-index | 0.00 |
We consider how to efficiently compute the overlap between all pairs of web documents. This information can be used to improve web crawlers, web archives and in the presentation of search results, among others. Our experiments show that how common replication is on the web, and testified that our algorithm is better than others.