Full text indexing based on lexical relations an application: software libraries

Authors:
Y. S. Maarek;F. Z. Smadja
Affiliations:
IBM Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY;Department of Computer Science, Columbia University, New York, NY
Venue:
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1989

Citing 7
Cited 30

An evaluation of retrieval effectiveness for a full-text document-retrieval system

Communications of the ACM
Synonymy and semantic classification

Synonymy and semantic classification
Technical correspondence

Communications of the ACM - Special issue: computing in the frontiers of science and engineering
Using conceptual clustering for classifying reusable Ada code

SIGAda '87 Proceedings of the 1987 annual ACM SIGAda international conference on Ada
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Information retrieval by text skimming

Information retrieval by text skimming
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing

Support for change in RPDE3

SDE 4 Proceedings of the fourth ACM SIGSOFT symposium on Software development environments
Integrating information retrieval and domain specific approaches for browsing and retrieval in object-oriented class libraries

OOPSLA '91 Conference proceedings on Object-oriented programming systems, languages, and applications
Software library construction from an IR perspective

ACM SIGIR Forum
Retrieving software objects in an example-based programming environment

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
An Information Retrieval Approach for Automatically Constructing Software Libraries

IEEE Transactions on Software Engineering
Adapting a full-text information retrieval system to the computer troubleshooting domain

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Principled disambiguation: discriminating adjective senses with modified nouns

Computational Linguistics
Organizing documents to support browsing in digital libraries

ACM SIGOIS Bulletin - Special issue on digital libraries
Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
An evolutionary approach to constructing effective software reuse repositories

ACM Transactions on Software Engineering and Methodology (TOSEM)
Exploiting clustering and phrases for context-based information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval algorithms: a survey

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Knowledge encapsulation for focused search from pervasive devices

Proceedings of the 10th international conference on World Wide Web
Supporting program comprehension using semantic and structural information

ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Knowledge encapsulation for focused search from pervasive devices

ACM Transactions on Information Systems (TOIS)
Automatic query wefinement using lexical affinities with maximal information gain

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Morphological Disambiguation for Hebrew Search Systems

NGIT '99 Proceedings of the 4th International Workshop on Next Generation Information Technologies and Systems
Recovering documentation-to-source-code traceability links using latent semantic indexing

Proceedings of the 25th International Conference on Software Engineering
Identification of High-Level Concept Clones in Source Code

Proceedings of the 16th IEEE international conference on Automated software engineering
Multi-resolution disambiguation of term occurrences

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Lexical semantic techniques for corpus analysis

Computational Linguistics - Special issue on using large corpora: II
Contextual word similarity and estimation from sparse data

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Automatically extracting and representing collocations for language generation

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
An application of lexical semantics to knowledge acquisition from corpora

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2
Scaling IR-system evaluation using term relevance sets

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Core algorithms in the CLEVER system

ACM Transactions on Internet Technology (TOIT)
Linguini: language identification for multilingual documents

Journal of Management Information Systems - Special section: Exploring the outlands of the MIS discipline
Ranking web documents with dynamic evaluation by expert groups

CAiSE'03 Proceedings of the 15th international conference on Advanced information systems engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In contrast to other kinds of libraries, software libraries need to be conceptually organized. When looking for a component, the main concern of users is the functionality of the desired component; implementation details are secondary. Software reuse would be enhanced with conceptually organized large libraries of software components. In this paper, we present GURU, a tool that allows automatical building of such large software libraries from documented software components. We focus here on GURU's indexing component which extracts conceptual attributes from natural language documentation. This indexing method is based on words' co-occurrences. It first uses EXTRACT, a co-occurrence knowledge compiler for extracting potential attributes from textual documents. Conceptually relevant collocations are then selected according to their resolving power, which scales down the noise due to context words. This fully automated indexing tool thus goes further than keyword-based tools in the understanding of a document without the brittleness of knowledge based tools. The indexing component of GURU is fully implemented, and some results are given in the paper.