Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Partial replica selection based on relevance for information retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Partial collection replication versus caching for information retrieval systems
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An improved boosting algorithm and its application to text categorization
Proceedings of the ninth international conference on Information and knowledge management
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
Data Mining and Knowledge Discovery
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
SETS: search enhanced by topic segmentation
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
On-line learning for very large data sets: Research Articles
Applied Stochastic Models in Business and Industry - Statistical Learning
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
Proceedings of the 24th international conference on Machine learning
Trust Region Newton Method for Logistic Regression
The Journal of Machine Learning Research
On the feasibility of geographically distributed web crawling
Proceedings of the 3rd international conference on Scalable information systems
A Study of the Impact of Index Updates on Distributed Query Processing for Web Search
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Efficiency trade-offs in two-tier web search systems
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Quantifying performance and quality gains in distributed web search engines
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The Geographical Life of Search
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
On the feasibility of multi-site web search engines
Proceedings of the 18th ACM conference on Information and knowledge management
ACM Transactions on Information Systems (TOIS)
A refreshing perspective of search engine caching
Proceedings of the 19th international conference on World wide web
Query forwarding in geographically distributed search engines
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Very high accuracy and fast dependency parsing is not a contradiction
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Assigning documents to master sites in distributed search
Proceedings of the 20th ACM international conference on Information and knowledge management
Reactive index replication for distributed search engines
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Document replication strategies for geographically distributed web search engines
Information Processing and Management: an International Journal
Improving the efficiency of multi-site web search engines
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.