Strategies for building distributed information retrieval systems
Information Processing and Management: an International Journal
SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Performance Characterization of Quorum-Consensus Algorithms for Replicated Data
IEEE Transactions on Software Engineering
A case study of caching strategies for a distributed full text retrieval system
Information Processing and Management: an International Journal
Data caching strategies for distributed full text retrieval systems
Information Systems
Prototyping a distributed information retrieval system that uses statistical ranking
Information Processing and Management: an International Journal
Distributed algorithms for dynamic replication of data
PODS '92 Proceedings of the eleventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Parallelizing I/O intensive applications for a workstation cluster: a case study
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Journal of the American Society for Information Science
Distributed queries and incremental updates in information retrieval systems
Distributed queries and incremental updates in information retrieval systems
Introducing application-level replication and naming into today's Web
Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
Performance evaluation of a distributed architecture for information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Partial replica selection based on relevance for information retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval performance of a distributed text database utilizing a parallel processor document server
DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
ACM Transactions on Information Systems (TOIS)
A survey of web caching schemes for the Internet
ACM SIGCOMM Computer Communication Review
A Competitive Dynamic Data Replication Algorithm
Proceedings of the Ninth International Conference on Data Engineering
Cooperative Caching of Dynamic Content on a Distributed Web Server
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
An Efficient Scheme for Dynamic Data Replication
An Efficient Scheme for Dynamic Data Replication
Scalable distributed architectures for information retrieval
Scalable distributed architectures for information retrieval
Rank-preserving two-level caching for scalable search engines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Query clustering using user logs
ACM Transactions on Information Systems (TOIS)
Operational requirements for scalable search systems
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Guiding queries to information sources with InfoBeacons
Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
A case study of distributed information retrieval architectures to index one terabyte of text
Information Processing and Management: an International Journal
Replica Placement Algorithms for Mobile Transaction Systems
IEEE Transactions on Knowledge and Data Engineering
Information Processing and Management: an International Journal
Routing Queries through a Peer-to-Peer InfoBeacons Network Using Information Retrieval Techniques
IEEE Transactions on Parallel and Distributed Systems
A case study of distributed information retrieval architectures to index one terabyte of text
Information Processing and Management: an International Journal
Performance comparison of clustered and replicated information retrieval systems
ECIR'07 Proceedings of the 29th European conference on IR research
Document assignment in multi-site search engines
Proceedings of the fourth ACM international conference on Web search and data mining
Using information retrieval techniques to route queries in an infobeacons network
DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Document replication strategies for geographically distributed web search engines
Information Processing and Management: an International Journal
Hi-index | 0.00 |
The explosion of content in distributed information retrieval (IR) systems requires new mechanisms to attain timely and accurate retrieval of unstructured text. In this paper, we compare two mechanisms to improve IR system performance: partial collection replication and caching. When queries have locality, both mechanisms return results more quickly than sending queries to the original collection(s). Caches return results when queries exactly match a previous one. Partial replicas are a form of caching that return results when the IR technology determines the query is a good match. Caches are simpler and faster, but replicas can increase locality by detecting similarity between queries that are not exactly the same. We use real traces from THOMAS and Excite to measure query locality and similarity. With a very restrictive definition of query similarity, similarity improves query locality up to 15% over exact match. We use a validated simulator to compare their performance, and find that even if the partial replica hit rate increases only 3 to 6%, it will outperform simple caching under a variety of configurations. A combined approach will probably yield the best performance.