Toward a distributed terabyte text retrieval system in China-US million book digital library

  • Authors:
  • Bin Liu;Wen Gao;Ling Zhang;Tie-jun Huang;Xiao-ming Zhang;Jun Cheng

  • Affiliations:
  • Chinese Academy of Sciences, Beijing, P.R.China;Chinese Academy of Sciences, Beijing, P.R.China;Chinese Academy of Sciences, Beijing, P.R.China;Chinese Academy of Sciences, Beijing, P.R.China;Chinese Academy of Sciences, Beijing, P.R.China;Chinese Academy of Sciences, Beijing, P.R.China

  • Venue:
  • Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In China-US Million Book Digital Library, output of the digitalization process is more than one terabyte of text in OEB and PDF format. To access these data quickly and accurately, we are developing a distributed terabyte text retrieval system. With the query cache, system can search less data while maintaining acceptable retrieval accuracy. From the OEB package, we get its metadata and structural information to implement multi-scale indexing and retrieval. We are to explore some new retrieval models and text clustering approaches in the Digital Library.