The effectiveness of GIOSS for the text database discovery problem
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Data structures for efficient broker implementation
ACM Transactions on Information Systems (TOIS)
Effective retrieval with distributed collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating database selection techniques: a testbed and experiment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Methods for information server selection
ACM Transactions on Information Systems (TOIS)
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Comparing the performance of database selection algorithms
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
ACM Transactions on Information Systems (TOIS)
Server selection on the World Wide Web
DL '00 Proceedings of the fifth ACM conference on Digital libraries
The impact of database selection on distributed searching
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Collection selection and results merging with topically organized U.S. patents and TREC data
Proceedings of the ninth international conference on Information and knowledge management
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
Querying websites using compact skeletons
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Dynamically distributed query evaluation
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Information Retrieval
Modern Information Retrieval
A Distributed Search System Based on Markov Decision Processes
ICSC '99 Proceedings of the 5th International Computer Science Conference on Internet Applications
Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet
Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Predictive caching and prefetching of query results in search engines
WWW '03 Proceedings of the 12th international conference on World Wide Web
Parallel Search using Partitioned Inverted Files
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Peer-to-peer information retrieval using self-organizing semantic overlay networks
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
The Efficacy of GlOSS for the Text Database Discovery Problem
The Efficacy of GlOSS for the Text Database Discovery Problem
Effective and Efficient Automatic Database Selection
Effective and Efficient Automatic Database Selection
Pharos: A Scalable Distributed Architecture for Locating Heterogeneous Information Sources
Pharos: A Scalable Distributed Architecture for Locating Heterogeneous Information Sources
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic Small World: An Overlay Network for Peer-to-Peer Search
ICNP '04 Proceedings of the 12th IEEE International Conference on Network Protocols
ACM Transactions on Information Systems (TOIS)
The query-vector document model
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Load-balancing and caching for collection selection architectures
Proceedings of the 2nd international conference on Scalable information systems
Query-sets: using implicit feedback and query patterns to organize web documents
Proceedings of the 17th international conference on World Wide Web
Ranking information resources in peer-to-peer text retrieval: an experimental study
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Collection selection: ...now, with more documents!
Proceedings of the 3rd international conference on Scalable information systems
Quantifying performance and quality gains in distributed web search engines
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
ACM Transactions on Information Systems (TOIS)
An evaluation measure for distributed information retrieval systems
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
Document allocation policies for selective searching of distributed indexes
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Reverted indexing for feedback and expansion
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Inverted index compression via online document routing
Proceedings of the 20th international conference on World wide web
Indexing strategies for graceful degradation of search quality
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Posting list intersection on multicore architectures
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
ACM Transactions on Information Systems (TOIS)
Shard ranking and cutoff estimation for topically partitioned collections
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
We present a novel strategy to partition a document collection onto several servers and to perform effective collection selection. The method is based on the analysis of query logs. We proposed a novel document representation called query-vectors model. Each document is represented as a list recording the queries for which the document itself is a match, along with their ranks. To both partition the collection and build the collection selection function, we co-cluster queries and documents. The document clusters are then assigned to the underlying IR servers, while the query clusters represent queries that return similar results, and are used for collection selection. We show that this document partition strategy greatly boosts the performance of standard collection selection algorithms, including CORI, w.r.t. a round-robin assignment. Secondly, we show that performing collection selection by matching the query to the existing query clusters and successively choosing only one server, we reach an average precision-at-5 up to 1.74 and we constantly improve CORI precision of a factor between 11% and 15%. As a side result we show a way to select rarely asked-for documents. Separating these documents from the rest of the collection allows the indexer to produce a more compact index containing only relevant documents that are likely to be requested in the future. In our tests, around 52% of the documents (3,128,366) are not returned among the first 100 top-ranked results of any query.