A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project

Authors:
Hsinchun Chen;Bruce Schatz;Tobun Ng;Joanne Martinez;Amy Kirchhoff;Chienting Lin
Affiliations:
-;-;-;-;-;-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
1996

Citing 24
Cited 30

The vocabulary problem in human-system communication

Communications of the ACM
Parallel text search methods

Communications of the ACM
Automatic text processing

Automatic text processing
A model of knowledge based information retrieval with hierarchical concept

Journal of Documentation
Experiments with query acquisition and use in document retrieval systems

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Cognitive process as a basis for intelligent retrieval systems design

Information Processing and Management: an International Journal
Introduction: parallel processing and information retrieval

Information Processing and Management: an International Journal - Special issue on parallel processing and information retrieval
Information retrieval on the connection machine: 1 to 8192 gigabytes

Information Processing and Management: an International Journal - Special issue on parallel processing and information retrieval
PThomas: an adaptive information retrieval system on the connection machine

Information Processing and Management: an International Journal - Special issue on parallel processing and information retrieval
On the allocation of documents in multiprocessor information retrieval systems

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Clustering algorithms

Information retrieval
Experiments in automatic statistical thesaurus construction

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Does your workstation computation belong on a vector supercomputer?

Communications of the ACM
An analysis of performance and cost factors in searching large text databases using parallel search systems

Journal of the American Society for Information Science
Automatic thesaurus generation for an electronic community system

Journal of the American Society for Information Science
Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms

Journal of the American Society for Information Science
An algorithmic approach to concept exploration in a large knowledge network (automatic thesaurus consultation): symbolic branch-and-bound search vs. connectionist Hopfield net activation

Journal of the American Society for Information Science
A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system

Journal of the American Society for Information Science
On Relevance, Probabilistic Indexing and Information Retrieval

Journal of the ACM (JACM)
The Association Factor in Information Retrieval

Journal of the ACM (JACM)
The Psychology of Human-Computer Interaction

The Psychology of Human-Computer Interaction
Generating, integrating, and activating thesauri for concept-based document retrieval

IEEE Expert: Intelligent Systems and Their Applications
Report on Workshop on High Performance Computing and Communications for Grand Challenge Applications: Computer Vision, Speech and Natural Language Processing, and Artificial Intelligence

IEEE Transactions on Knowledge and Data Engineering
Building a large thesaurus for information retrieval

ANLC '88 Proceedings of the second conference on Applied natural language processing

Interactive term suggestion for users of digital libraries: using subject thesauri and co-occurrence lists for information retrieval

Proceedings of the first ACM international conference on Digital libraries
Semantic indexing for a complete subject discipline

Proceedings of the fourth ACM conference on Digital libraries
Performance and implications of semantic indexing in a distributed environment

Proceedings of the eighth international conference on Information and knowledge management
Guiding people to information: providing an interface to a digital library using reference as a basis for indexing

Proceedings of the 5th international conference on Intelligent user interfaces
Support concept-based multimedia information retrieval: a knowledge management approach

ICIS '99 Proceedings of the 20th international conference on Information Systems
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Building thematic lexical resources by term categorization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Index Navigator: Search Engine with Reasoning for Understanding and Expressing User's Changing Mind

Applied Intelligence
Digital Libraries for the Next Millennium: Challenges and Research Directions

Information Systems Frontiers
Similarity is a Geometer

Multimedia Tools and Applications
COPLINK: managing law enforcement data and knowledge

Communications of the ACM
Integrating Knowledge on the Web

IEEE Internet Computing
Unifying Keywords and Visual Contents in Image Retrieval

IEEE MultiMedia
Federating Diverse Collections of Scientific Literature

Computer
Guest Editors' Introduction: Digital Libraries-Technological Advances and Social Impacts

Computer
Federated Search of Scientific Literature

Computer
Automatic discovery of similarity relationships through Web mining

Decision Support Systems - Web retrieval and mining
From Digital Library to Digital Government: A Case Study in Crime Data Mapping and Mining

ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Resource Annotation Framework in a Georeferenced and Geospatial Digital Library

ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Self Organizing Map and Sammon Mapping for Asymmetric Proximities

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Refining Search Expression by Discovering Hidden User's Interests

DS '98 Proceedings of the First International Conference on Discovery Science
Automatic thesaurus generation for Chinese documents

Journal of the American Society for Information Science and Technology
HelpfulMed: intelligent searching for medical information over the internet

Journal of the American Society for Information Science and Technology
COPLINK: visualization and collaboration for law enforcement

dg.o '02 Proceedings of the 2002 annual national conference on Digital government research
Resource Discovery in a European Spatial Data Infrastructure

IEEE Transactions on Knowledge and Data Engineering
Workflow-Centric Information Distribution Through E-Mail

Journal of Management Information Systems
Cross-lingual thesaurus for multilingual knowledge management

Decision Support Systems
Automatic index construction for multimedia digital libraries

Information Processing and Management: an International Journal
Automatic construction of cross-lingual networks of concepts from the Hong Kong SAR police department

ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
A personal ontology model for library recommendation system

ICADL'06 Proceedings of the 9th international conference on Asian Digital Libraries: achievements, Challenges and Opportunities

Quantified Score

Hi-index	0.16

Visualization

Abstract

This research presents preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer as the concept space approach, we aimed to create graphs of domain-specific concepts (terms) and their weighted co-occurrence relationships for all major engineering domains. Merging these concept spaces and providing traversal paths across different concept spaces could potentially help alleviate the vocabulary (difference) problem evident in large-scale information retrieval. We have experimented previously with such a technique for a smaller molecular biology domain (Worm Community System, with 10+ MBs of document collection) with encouraging results.In order to address the scalability issue related to large-scale information retrieval and analysis for the current Illinois DLI project, we recently conducted experiments using the concept space approach on parallel supercomputers. Our test collection included 2+ GBs of computer science and electrical engineering abstracts extracted from the INSPEC database. The concept space approach called for extensive textual and statistical analysis (a form of knowledge discovery) based on automatic indexing and co-occurrence analysis algorithms, both previously tested in the biology domain. Initial testing results using a 512-node CM-5 and a 16-processor SGI Power Challenge were promising. Power Challenge was later selected to create a comprehensive computer engineering concept space of about 270,000 terms and 4,000,000+ links using 24.5 hours of CPU time. Our system evaluation involving 12 knowledgeable subjects revealed that the automatically-created computer engineering concept space generated significantly higher concept recall than the human-generated INSPEC computer engineering thesaurus. However, the INSPEC was more precise than the automatic concept space. Our current work mainly involves creating concept spaces for other major engineering domains and developing robust graph matching and traversal algorithms for cross-domain, concept-based retrieval. Future work also will include generating individualized concept spaces for assisting user-specific concept-based information retrieval.