Partial collection replication versus caching for information retrieval systems

Authors:
Zhihong Lu;Kathryn S. McKinley
Affiliations:
Village Networks. Inc., Hazlet, NJ;Department of Computer Science, University of Massachtmetts, Amhest, MA
Venue:
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2000

Citing 21
Cited 13

Strategies for building distributed information retrieval systems

Information Processing and Management: an International Journal
Data cashing in IR systems

SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Performance Characterization of Quorum-Consensus Algorithms for Replicated Data

IEEE Transactions on Software Engineering
A case study of caching strategies for a distributed full text retrieval system

Information Processing and Management: an International Journal
Data caching strategies for distributed full text retrieval systems

Information Systems
Prototyping a distributed information retrieval system that uses statistical ranking

Information Processing and Management: an International Journal
Distributed algorithms for dynamic replication of data

PODS '92 Proceedings of the eleventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Parallelizing I/O intensive applications for a workstation cluster: a case study

ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
An analysis of performance and cost factors in searching large text databases using parallel search systems

Journal of the American Society for Information Science
Distributed queries and incremental updates in information retrieval systems

Distributed queries and incremental updates in information retrieval systems
Introducing application-level replication and naming into today's Web

Proceedings of the fifth international World Wide Web conference on Computer networks and ISDN systems
Performance evaluation of a distributed architecture for information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Partial replica selection based on relevance for information retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval performance of a distributed text database utilizing a parallel processor document server

DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

ACM Transactions on Information Systems (TOIS)
A survey of web caching schemes for the Internet

ACM SIGCOMM Computer Communication Review
A Competitive Dynamic Data Replication Algorithm

Proceedings of the Ninth International Conference on Data Engineering
Cooperative Caching of Dynamic Content on a Distributed Web Server

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Demand-based document dissemination to reduce traffic and balance load in distributed information systems

SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
An Efficient Scheme for Dynamic Data Replication

An Efficient Scheme for Dynamic Data Replication
Scalable distributed architectures for information retrieval

Scalable distributed architectures for information retrieval

Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Query clustering using user logs

ACM Transactions on Information Systems (TOIS)
Operational requirements for scalable search systems

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Guiding queries to information sources with InfoBeacons

Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
A case study of distributed information retrieval architectures to index one terabyte of text

Information Processing and Management: an International Journal
Replica Placement Algorithms for Mobile Transaction Systems

IEEE Transactions on Knowledge and Data Engineering
Performance analysis of distributed information retrieval architectures using an improved network simulation model

Information Processing and Management: an International Journal
Routing Queries through a Peer-to-Peer InfoBeacons Network Using Information Retrieval Techniques

IEEE Transactions on Parallel and Distributed Systems
A case study of distributed information retrieval architectures to index one terabyte of text

Information Processing and Management: an International Journal
Performance comparison of clustered and replicated information retrieval systems

ECIR'07 Proceedings of the 29th European conference on IR research
Document assignment in multi-site search engines

Proceedings of the fourth ACM international conference on Web search and data mining
Using information retrieval techniques to route queries in an infobeacons network

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing
Document replication strategies for geographically distributed web search engines

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The explosion of content in distributed information retrieval (IR) systems requires new mechanisms to attain timely and accurate retrieval of unstructured text. In this paper, we compare two mechanisms to improve IR system performance: partial collection replication and caching. When queries have locality, both mechanisms return results more quickly than sending queries to the original collection(s). Caches return results when queries exactly match a previous one. Partial replicas are a form of caching that return results when the IR technology determines the query is a good match. Caches are simpler and faster, but replicas can increase locality by detecting similarity between queries that are not exactly the same. We use real traces from THOMAS and Excite to measure query locality and similarity. With a very restrictive definition of query similarity, similarity improves query locality up to 15% over exact match. We use a validated simulator to compare their performance, and find that even if the partial replica hit rate increases only 3 to 6%, it will outperform simple caching under a variety of configurations. A combined approach will probably yield the best performance.