Performance and implications of semantic indexing in a distributed environment

  • Authors:
  • Conrad T. K. Chang;Bruce R. Schatz

  • Affiliations:
  • Community Architectures for Network Information Systems (CANIS), Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 704 S. Sixth Street, Champaign, IL;Community Architectures for Network Information Systems (CANIS), Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 704 S. Sixth Street, Champaign, IL

  • Venue:
  • Proceedings of the eighth international conference on Information and knowledge management
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

A research prototype is presented for semantic indexing and retrieval in Information Retrieval. The prototype is motivated by a desire to provide a more efficient and effective information retrieval system compared to the current state of the art. An overview of the Interspace architecture layers is discussed. An object model supporting semantic operations is developed. The model contains a rich set of classes and relationships of the data for the semantic indexing module. The basis of our semantic indexing is done by the creation of concept space. A concept space is an index of a collection that uses document statistics to capture the relationships between concepts. It is useful for boosting text search, by term suggestion of alternative terms semantically related to query terms. Over the years, we have developed generic technology for concept spaces computation on large collections across many subjects. Recent computations on discipline-scale collections have been made on high-end supercomputers. This paper describes our implementation and implications of the computation in a distributed computing environment. Experimental results using different collection sizes and number of processes are presented to show the feasibility of this approach. We also show that laboratory and community collections are already easily computable using a group of PCs in a lab via a message-passing model. We conclude that PC clusters will shortly be able to compute semantic indexes for any real collections.