Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Defining logical domains in a web site
HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Integrating content search with structure analysis for hypermedia retrieval and management
ACM Computing Surveys (CSUR)
ACM Transactions on Internet Technology (TOIT)
Aliasing on the world wide web: prevalence and performance implications
Proceedings of the 11th international conference on World Wide Web
Reasoning for web document associations and its applications in site map construction
Data & Knowledge Engineering
Using Random Walks for Mining Web Document Associations
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
Searching the hypermedia web: improved topic distillation through network analytic relevance ranking
The New Review of Hypermedia and Multimedia - Hypermedia and the world wide web
Improving web search by the identification of contextual information
Intelligent exploration of the web
Mining Web Informative Structures and Contents Based on Entropy Analysis
IEEE Transactions on Knowledge and Data Engineering
A large-scale study of the evolution of web pages
Software—Practice & Experience - Special issue: Web technologies
Automatic identification of user goals in Web search
WWW '05 Proceedings of the 14th international conference on World Wide Web
LSH forest: self-tuning indexes for similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
Characterizing a national community web
ACM Transactions on Internet Technology (TOIT)
Managing duplicates in a web archive
Proceedings of the 2006 ACM symposium on Applied computing
Efficient exact set-similarity joins
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Lazy preservation: reconstructing websites by crawling the crawlers
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Estimating corpus size via queries
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Web Dragons: Inside the Myths of Search Engine Technology
Web Dragons: Inside the Myths of Search Engine Technology
Do not crawl in the dust: different urls with similar text
Proceedings of the 16th international conference on World Wide Web
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
Mirror site maintenance based on evolution associations of web directories
Proceedings of the 16th international conference on World Wide Web
A cost-effective method for detecting web site replicas on search engine databases
Data & Knowledge Engineering
Bottom-k sketches: better and more efficient estimation of aggregates
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Summarizing data using bottom-k sketches
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Genealogical trees on the web: a search engine user perspective
Proceedings of the 17th international conference on World Wide Web
Improving web information indexing and retrieval based on center block duplication detection
International Journal of Innovative Computing and Applications
Tighter estimation using bottom k sketches
Proceedings of the VLDB Endowment
Do not crawl in the DUST: Different URLs with similar text
ACM Transactions on the Web (TWEB)
IRLbot: Scaling to 6 billion pages and beyond
ACM Transactions on the Web (TWEB)
Leveraging discarded samples for tighter estimation of multiple-set aggregates
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Changing how people view changes on the web
Proceedings of the 22nd annual ACM symposium on User interface software and technology
Foundations and Trends in Information Retrieval
Graph homomorphism revisited for graph matching
Proceedings of the VLDB Endowment
On the evolution of clusters of near-duplicate web pages
Journal of Web Engineering
A systematic study of parameter correlations in large scale duplicate document detection
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Enhancing duplicate collection detection through replica boundary discovery
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Replica-aware caching for Web proxies
Computer Communications
Hi-index | 0.00 |