Telcordia LSI Engine: Implementation and Scalability Issues

  • Authors:
  • Chung-Min Chen;Ned Stoffel;Mike Post;Chumki Basu;Devasis Bassu;Clifford Behrens

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • RIDE '01 Proceedings of the 11th International Workshop on research Issues in Data Engineering
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: Latent Semantic Indexing (LSI), a vector space- based approach to information retrieval, has been proven to be an effective tool in correlating and retrieving relevant documents. While much work has been published on LSI, most of it addresses the algorithmic or theoretical basis of the model. Little, if any, presents implementation issues in practice. In this paper, we describe a production-level implementation of LSI. The system integrates components including document collection and preprocessing, singular value decomposition (SVD), multilingual processing, and a tree-based access method for similarity querying. We discuss implementation issues encountered during the development of the system. In particular, we address scalability issues in the query engine and various components of the system, and present lessons learned.