Corpus-dependent association thesauri for information retrieval

  • Authors:
  • Hiroyuki Kaji;Yasutsugu Morimoto;Toshiko Aizono;Noriyuki Yamasaki

  • Affiliations:
  • Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan;Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan;Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan;Software Division, Hitachi, Ltd., Kanagawa, Japan

  • Venue:
  • COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a method for automatically generating an association thesaurus from a text corpus, and demonstrates its application to information retrieval. The thesaurus generation method consists of extracting terms and co-occurrence data from a corpus and analyzing the correlation between terms statistically. A new method for disambiguating the structure of compound nouns, which is a key component for term extraction, is also proposed. The automatically generated thesaurus is effectively used as a tool for exploring information. A thesaurus navigator having novel functions such as term clustering, thesaurus overview, and zooming-in is proposed.