Viewing morphology as an inference process
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
ACM Transactions on Information Systems (TOIS)
Information Retrieval
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
SETS: search enhanced by topic segmentation
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Multi-Tier Architecture for Web Search Engines
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Operational requirements for scalable search systems
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Combining the language model and inference network approaches to retrieval
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Optimization strategies for complex queries
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Query-driven document partitioning and collection selection
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
A pipelined architecture for distributed text query evaluation
Information Retrieval
Distributed search over the hidden web: hierarchical database sampling and selection
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficiency trade-offs in two-tier web search systems
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
SUSHI: scoring scaled samples for server selection
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Classification-based resource selection
Proceedings of the 18th ACM conference on Information and knowledge management
ACM Transactions on Information Systems (TOIS)
Central-rank-based collection selection in uncooperative distributed information retrieval
ECIR'07 Proceedings of the 29th European conference on IR research
Query forwarding in geographically distributed search engines
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Document allocation policies for selective searching of distributed indexes
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Content-based peer-to-peer network overlay for full-text federated search
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Semantic overlay networks for p2p systems
AP2PC'04 Proceedings of the Third international conference on Agents and Peer-to-Peer Computing
An evaluation of a cluster-based architecture for peer-to-peer information retrieval
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Taily: shard selection using the tail of score distributions
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Rank-energy selective query forwarding for distributed search systems
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Large document collections can be partitioned into 'topical shards' to facilitate distributed search. In a low-resource search environment only a few of the shards can be searched in parallel. Such a search environment faces two intertwined challenges. First, determining which shards to consult for a given query: shard ranking. Second, how many shards to consult from the ranking: cutoff estimation. In this paper we present a family of three algorithms that address both of these problems. As a basis we employ a commonly used data structure, the central sample index (CSI), to represent the shard contents. Running a query against the CSI yields a flat document ranking that each of our algorithms transforms into a tree structure. A bottom up traversal of the tree is used to infer a ranking of shards and also to estimate a stopping point in this ranking that yields cost-effective selective distributed search. As compared to a state-of-the-art shard ranking approach the proposed algorithms provide substantially higher search efficiency while providing comparable search effectiveness.