Approaches to collection selection and results merging for distributed information retrieval

Authors:
Yves Rasolofo;Faïza Abbaci;Jacques Savoy
Affiliations:
Université de Neuchâtel, Neuchâtel, Switzerland;Ecole nationale supérieure des Mines de Saint-Étienne, Saint-Étienne, France;Université de Neuchâtel, Neuchâtel, Switzerland
Venue:
Proceedings of the tenth international conference on Information and knowledge management
Year:
2001

Citing 16
Cited 26

NetSerf: using semantic knowledge to find Internet information archives

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning collection fusion strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Effective retrieval with distributed collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Methods for information server selection

ACM Transactions on Information Systems (TOIS)
Inquirus, the NECI meta search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Comparing the performance of database selection algorithms

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Server selection on the World Wide Web

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Experimentation as a way of life: Okapi at TREC

Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
The impact of database selection on distributed searching

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Database merging strategy based on logistic regression

Information Processing and Management: an International Journal
Collection selection and results merging with topically organized U.S. patents and TREC data

Proceedings of the ninth international conference on Information and knowledge management
Implementation of the SMART Information Retrieval System

Implementation of the SMART Information Retrieval System
Towards comprehensive web search

Towards comprehensive web search

Automated discovery of search interfaces on the web

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Result merging strategies for a current news metasearcher

Information Processing and Management: an International Journal
Information Retrieval with Distributed Databases: Analytic Models of Performance

IEEE Transactions on Parallel and Distributed Systems
Server selection methods in hybrid portal search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation by comparing result sets in context

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Information redundancy across metadata collections

Information Processing and Management: an International Journal
Recommenders in a personalized, collaborative digital library environment

Journal of Intelligent Information Systems
Distributed text retrieval from overlapping collections

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Probability-based fusion of information retrieval result sets

Artificial Intelligence Review
Enhancing web search by promoting multiple search engine use

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Contextualized query sampling to discover semantic resource descriptions on the web

Information Processing and Management: an International Journal
Robust result merging using sample-based score estimates

ACM Transactions on Information Systems (TOIS)
Exploiting Parallelism to Accelerate Keyword Search on Deep-Web Sources

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
Server selection methods in personal metasearch: a comparative empirical study

Information Retrieval
An evolutionary approach to query-sampling for heterogeneous systems

Expert Systems with Applications: An International Journal
Term proximity scoring for keyword-based retrieval systems

ECIR'03 Proceedings of the 25th European conference on IR research
Collection profiling for collection fusion in distributed information retrieval systems

KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Factors affecting click-through behavior in aggregated search interfaces

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Research proposal for distributed deep web search

PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
DOCODE-lite: a meta-search engine for document similarity retrieval

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Federated Search

Foundations and Trends in Information Retrieval
Evolutionary approach for semantic-based query sampling in large-scale information sources

Information Sciences: an International Journal
TOP-k query calculation in peer-to-peer networks

ASIAN'05 Proceedings of the 10th Asian Computing Science conference on Advances in computer science: data management on the web
To what problem is distributed information retrieval the solution?

Journal of the American Society for Information Science and Technology
Mixture model with multiple centralized retrieval algorithms for result merging in federated search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Merging algorithms for enterprise search

Proceedings of the 18th Australasian Document Computing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have investigated two major issues in Distributed Information Retrieval (DIR), namely: collection selection and search results merging. While most published works on these two issues are based on pre-stored metadata, the approaches described in this paper involve extracting the required information at the time the query is processed. In order to predict the relevance of collections to a given query, we analyse a limited number of full documents (e.g., the top five documents) retrieved from each collection and then consider term proximity within them. On the other hand, our merging technique is rather simple since input only requires document scores and lengths of results lists. Our experiments evaluate the retrieval effectiveness of these approaches and compare them with centralised indexing and various other DIR techniques (e.g., CORI). We conducted our experiments using two testbeds: one containing news articles extracted from four different sources (2 GB) and another containing 10 GB of Web pages. Our evaluations demonstrate that the retrieval effectiveness of our simple approaches is worth considering.