Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Comparing the performance of collection selection algorithms
ACM Transactions on Information Systems (TOIS)
A Frequency-based Approach for Mining Coverage Statistics in Data Integration
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Improving collection selection with overlap awareness in P2P search engines
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Discovering and exploiting keyword and attribute-value co-occurrences to improve P2P routing indices
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Enhancing Search Performance on Gnutella-Like P2P Systems
IEEE Transactions on Parallel and Distributed Systems
Distributed text retrieval from overlapping collections
ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Federated text retrieval from uncooperative overlapped collections
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Diverse peer selection in collaborative web search
Proceedings of the 2009 ACM symposium on Applied Computing
Foundations and Trends in Information Retrieval
On the usage of global document occurrences in peer-to-peer information systems
OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
Compact features for detection of near-duplicates in distributed retrieval
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
IQN routing: integrating quality and novelty in P2P querying and ranking
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Towards benefit-based RDF source selection for SPARQL queries
SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Hi-index | 0.00 |
In an environment of distributed text collections, the first step in the information retrieval process is to identify which of all available collections are more relevant to a given query and which should thus be accessed to answer the query. We address the challenge of collection selection when there is full or partial overlap between the available text collections, a scenario which has not been examined previously despite its real-world applications. To that end, we present COSCO, a collection selection approach which uses collection-specific coverage and overlap statistics. We describe our experimental results which show that the presented approach displays the desired behavior of retrieving more new results early on in the collection order, and performs consistently and significantly better than CORI, previously considered to be one of the best collection selection systems.