Collection selection: ...now, with more documents!

Authors:
Diego Puppin
Affiliations:
-
Venue:
Proceedings of the 3rd international conference on Scalable information systems
Year:
2008

Citing 17
Cited 1

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Effective retrieval with distributed collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Building a distributed full-text index for the web

ACM Transactions on Information Systems (TOIS)
Information Retrieval: Computational and Theoretical Aspects

Information Retrieval: Computational and Theoretical Aspects
Performance of Inverted Indices in Distributed Text Document Retrieval Systems

PDIS '93 Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems
Dynamic maintenance of web indexes using landmarks

WWW '03 Proceedings of the 12th international conference on World Wide Web
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Operational requirements for scalable search systems

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Semantic Small World: An Overlay Network for Peer-to-Peer Search

ICNP '04 Proceedings of the 12th IEEE International Conference on Network Protocols
A statistics-based approach to incrementally update inverted files

Information Processing and Management: an International Journal
Query-driven document partitioning and collection selection

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
The query-vector document model

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Analyzing imbalance among homogeneous index servers in a web search system

Information Processing and Management: an International Journal
Load-balancing and caching for collection selection architectures

Proceedings of the 2nd international conference on Scalable information systems

Tuning the capacity of search engines: Load-driven routing and incremental caching to reduce and balance the load

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A way to reduce the computing pressure in a distributed IR system is to use document partitioning and to perform collection selection. With suitable training and/or modeling, the collection selection function can choose the most promising collections for each query, with high confidence. Unfortunately, if the collections need to be updated, we need to retrain the selection function, update its statistics or face the loss of some result quality. This paper introduces a simple, but very effective, technique to add new documents to collections in a system that uses collection selection. We show that we can update the individual collections, while guaranteeing the same selection performance, with no need to update or retrain the selection function.