The effectiveness of GIOSS for the text database discovery problem
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Server selection on the World Wide Web
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Query-based sampling of text databases
ACM Transactions on Information Systems (TOIS)
A language modeling framework for resource selection and results merging
Proceedings of the eleventh international conference on Information and knowledge management
QProber: A system for automatic classification of hidden-Web databases
ACM Transactions on Information Systems (TOIS)
Server Ranking for Distributed Text Retrieval Systems on the Internet
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Evaluating different methods of estimating retrieval quality for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Comparing the performance of collection selection algorithms
ACM Transactions on Information Systems (TOIS)
A semisupervised learning method to merge search engine results
ACM Transactions on Information Systems (TOIS)
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Unified utility maximization framework for resource selection
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A two-phase sampling technique for information extraction from hidden web databases
Proceedings of the 6th annual ACM international workshop on Web information and data management
Improving text collection selection with coverage and overlap statistics
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Server selection methods in hybrid portal search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling search engine effectiveness for federated search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Capturing collection size for distributed non-cooperative retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Distributed query sampling: a quality-conscious approach
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Using query logs to establish vocabularies in distributed information retrieval
Information Processing and Management: an International Journal
Compact features for detection of near-duplicates in distributed retrieval
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Adaptive query-based sampling of distributed collections
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Sample sizes for query probing in uncooperative distributed information retrieval
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Ranking information resources in peer-to-peer text retrieval: an experimental study
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Robust result merging using sample-based score estimates
ACM Transactions on Information Systems (TOIS)
Learning from past queries for resource selection
Proceedings of the 18th ACM conference on Information and knowledge management
SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement
Proceedings of the 20th international conference on World wide web
Foundations and Trends in Information Retrieval
Peer-to-Peer Information Retrieval: An Overview
ACM Transactions on Information Systems (TOIS)
Towards benefit-based RDF source selection for SPARQL queries
SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Federated search in the wild: the combined power of over a hundred search engines
Proceedings of the 21st ACM international conference on Information and knowledge management
Studying the clustering paradox and scalability of search in highly distributed environments
ACM Transactions on Information Systems (TOIS)
Assessing relevance and trust of the deep web sources and results based on inter-source agreement
ACM Transactions on the Web (TWEB)
Search result diversification in resource selection for federated search
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Agreement based source selection for the multi-topic deep web integration
Proceedings of the 17th International Conference on Management of Data
Hi-index | 0.00 |
In federated text retrieval systems, the query is sent to multiple collections at the same time. The results returned by collections are gathered and ranked by a central broker that presents them to the user. It is usually assumed that the collections have little overlap. However, in practice collections may share many common documents as either exact or near duplicates, potentially leading to high numbers of duplicates in the final results. Considering the natural band width restrictions and efficiency issues of federated search, sendingqueries to redundant collections leads to unnecessary costs. We propose a novel method for estimating the rate of over-lap among collections based on sampling. Then, using theestimated overlap statistics, we propose two collection selection methods that aim to maximize the number of unique relevant documents in the final results. We show experimentally that, although our estimates of overlap are not in exact, our suggested techniques can significantly improve the search effectiveness when collections overlap.