A Cache-Based Distributed Terabyte Text Retrieval System in CADAL

Authors:
Jun Cheng;Wen Gao;Bin Liu;Tie-jun Huang;Ling Zhang
Affiliations:
-;-;-;-;-
Venue:
ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Year:
2002

Citing 2
Cited 0

Toward a distributed terabyte text retrieval system in China-US million book digital library

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Scalable distributed architectures for information retrieval

Scalable distributed architectures for information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The China-America Digital Academic Library (CADAL) project aims to create a searchable collection of one million digital books freely available over the Internet. For this, a terabyte text retrieval system is required. This paper presents a cache-based, distributed terabyte text retrieval system, with fulltext retrieval, distributed computing and caching techniques. By distributing data by subject on different index servers, query searching is limited to specific index servers. With cache servers, response time is reduced. When queried, the system returns only highly relevant search results, to reduce the workload on the network. The prototype system shows the effectiveness of our design.