Caching and database scaling in distributed shared-nothing information retrieval systems

Authors:
Anthony Tomasic;Hector Garcia-Molina
Affiliations:
Stanford University, Department of Computer Science, Margaret Jacks Hall, Stanford, CA;Stanford University, Department of Computer Science, Margaret Jacks Hall, Stanford, CA
Venue:
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Year:
1993

Citing 8
Cited 6

Access methods for text

ACM Computing Surveys (CSUR) - Annals of discrete mathematics, 24
Parallel Querying of Large Databases: A Case Study

Computer
Partitioned posting files: a parallel inverted file structure for information retrieval

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Parallel text searching in serial files using a processor farm

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval performance of a distributed text database utilizing a parallel processor document server

DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Performance of inverted indices in shared-nothing distributed text document informatioon retrieval systems

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
Performance Measurements of the First RAID Prototype

Performance Measurements of the First RAID Prototype

Interaction of query evaluation and buffer management for information retrieval

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Query processing and inverted indices in shared: nothing text document information retrieval systems

The VLDB Journal — The International Journal on Very Large Data Bases - Parallelism in database systems
Hybrid Partition Inverted Files: Experimental Validation

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
A refreshing perspective of search engine caching

Proceedings of the 19th international conference on World wide web
A five-level static cache architecture for web search engines

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common class of existing information retrieval system provides access to abstracts. For example Stanford University, through its FOLIO system, provides access to the INSPECT database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this paper this database is studied by using a trace-driven simulation. We focus on physical index design, inverted index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an “optimal” configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.