Corpus-dependent association thesauri for information retrieval

Authors:
Hiroyuki Kaji;Yasutsugu Morimoto;Toshiko Aizono;Noriyuki Yamasaki
Affiliations:
Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan;Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan;Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan;Software Division, Hitachi, Ltd., Kanagawa, Japan
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 9
Cited 2

Comparison of hierarchic agglomerative clustering methods for document retrieval

The Computer Journal
Word association norms, mutual information, and lexicography

Computational Linguistics
A self-organizing semantic map for information retrieval

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Use of syntactic context to produce term association lists for text retrieval

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Analysis of Japanese compound nouns using collocational information

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2

Unsupervised word sense disambiguation using bilingual comparable corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Identifying synonymous concepts in preparation for technology mining

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a method for automatically generating an association thesaurus from a text corpus, and demonstrates its application to information retrieval. The thesaurus generation method consists of extracting terms and co-occurrence data from a corpus and analyzing the correlation between terms statistically. A new method for disambiguating the structure of compound nouns, which is a key component for term extraction, is also proposed. The automatically generated thesaurus is effectively used as a tool for exploring information. A thesaurus navigator having novel functions such as term clustering, thesaurus overview, and zooming-in is proposed.