Mirror, mirror on the Web: a study of host pairs with replicated content

Authors:
Krishna Bharat;Andrei Broder
Affiliations:
-;-
Venue:
WWW '99 Proceedings of the eighth international conference on World Wide Web
Year:
1999

Citing 0
Cited 39

Finding replicated Web collections

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Defining logical domains in a web site

HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Integrating content search with structure analysis for hypermedia retrieval and management

ACM Computing Surveys (CSUR)
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Aliasing on the world wide web: prevalence and performance implications

Proceedings of the 11th international conference on World Wide Web
Reasoning for web document associations and its applications in site map construction

Data & Knowledge Engineering
Using Random Walks for Mining Web Document Associations

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
A large-scale study of the evolution of web pages

WWW '03 Proceedings of the 12th international conference on World Wide Web
Searching the hypermedia web: improved topic distillation through network analytic relevance ranking

The New Review of Hypermedia and Multimedia - Hypermedia and the world wide web
Improving web search by the identification of contextual information

Intelligent exploration of the web
Mining Web Informative Structures and Contents Based on Entropy Analysis

IEEE Transactions on Knowledge and Data Engineering
A large-scale study of the evolution of web pages

Software—Practice & Experience - Special issue: Web technologies
Automatic identification of user goals in Web search

WWW '05 Proceedings of the 14th international conference on World Wide Web
LSH forest: self-tuning indexes for similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Characterizing a national community web

ACM Transactions on Internet Technology (TOIT)
Managing duplicates in a web archive

Proceedings of the 2006 ACM symposium on Applied computing
Efficient exact set-similarity joins

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Lazy preservation: reconstructing websites by crawling the crawlers

WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Estimating corpus size via queries

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Web Dragons: Inside the Myths of Search Engine Technology

Web Dragons: Inside the Myths of Search Engine Technology
Do not crawl in the dust: different urls with similar text

Proceedings of the 16th international conference on World Wide Web
Detecting near-duplicates for web crawling

Proceedings of the 16th international conference on World Wide Web
Mirror site maintenance based on evolution associations of web directories

Proceedings of the 16th international conference on World Wide Web
A cost-effective method for detecting web site replicas on search engine databases

Data & Knowledge Engineering
Bottom-k sketches: better and more efficient estimation of aggregates

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Summarizing data using bottom-k sketches

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Genealogical trees on the web: a search engine user perspective

Proceedings of the 17th international conference on World Wide Web
Improving web information indexing and retrieval based on center block duplication detection

International Journal of Innovative Computing and Applications
Tighter estimation using bottom k sketches

Proceedings of the VLDB Endowment
Do not crawl in the DUST: Different URLs with similar text

ACM Transactions on the Web (TWEB)
IRLbot: Scaling to 6 billion pages and beyond

ACM Transactions on the Web (TWEB)
Leveraging discarded samples for tighter estimation of multiple-set aggregates

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Changing how people view changes on the web

Proceedings of the 22nd annual ACM symposium on User interface software and technology
Web Crawling

Foundations and Trends in Information Retrieval
Graph homomorphism revisited for graph matching

Proceedings of the VLDB Endowment
On the evolution of clusters of near-duplicate web pages

Journal of Web Engineering
A systematic study of parameter correlations in large scale duplicate document detection

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Enhancing duplicate collection detection through replica boundary discovery

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Replica-aware caching for Web proxies

Computer Communications

Quantified Score

Hi-index	0.00

Mirror, mirror on the Web: a study of host pairs with replicated content

Quantified Score

Visualization

Abstract