Performance and implications of semantic indexing in a distributed environment

Authors:
Conrad T. K. Chang;Bruce R. Schatz
Affiliations:
Community Architectures for Network Information Systems (CANIS), Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 704 S. Sixth Street, Champaign, IL;Community Architectures for Network Information Systems (CANIS), Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 704 S. Sixth Street, Champaign, IL
Venue:
Proceedings of the eighth international conference on Information and knowledge management
Year:
1999

Citing 5
Cited 2

Distributed systems

Distributed systems
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information storage and retrieval

Information storage and retrieval
Federated Search of Scientific Literature

Computer

Semantic indexing for a complete subject discipline

Proceedings of the fourth ACM conference on Digital libraries
Meta-data Extraction and Query Translation. Treatment of Semantic Heterogeneity

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

A research prototype is presented for semantic indexing and retrieval in Information Retrieval. The prototype is motivated by a desire to provide a more efficient and effective information retrieval system compared to the current state of the art. An overview of the Interspace architecture layers is discussed. An object model supporting semantic operations is developed. The model contains a rich set of classes and relationships of the data for the semantic indexing module. The basis of our semantic indexing is done by the creation of concept space. A concept space is an index of a collection that uses document statistics to capture the relationships between concepts. It is useful for boosting text search, by term suggestion of alternative terms semantically related to query terms. Over the years, we have developed generic technology for concept spaces computation on large collections across many subjects. Recent computations on discipline-scale collections have been made on high-end supercomputers. This paper describes our implementation and implications of the computation in a distributed computing environment. Experimental results using different collection sizes and number of processes are presented to show the feasibility of this approach. We also show that laboratory and community collections are already easily computable using a group of PCs in a lab via a message-passing model. We conclude that PC clusters will shortly be able to compute semantic indexes for any real collections.