Toward a distributed terabyte text retrieval system in China-US million book digital library
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Scalable distributed architectures for information retrieval
Scalable distributed architectures for information retrieval
Hi-index | 0.00 |
The China-America Digital Academic Library (CADAL) project aims to create a searchable collection of one million digital books freely available over the Internet. For this, a terabyte text retrieval system is required. This paper presents a cache-based, distributed terabyte text retrieval system, with fulltext retrieval, distributed computing and caching techniques. By distributing data by subject on different index servers, query searching is limited to specific index servers. With cache servers, response time is reduced. When queried, the system returns only highly relevant search results, to reduce the workload on the network. The prototype system shows the effectiveness of our design.