Experiments in automatic statistical thesaurus construction

  • Authors:
  • Carolyn J. Crouch;Bokyung Yang

  • Affiliations:
  • Department of Computer Science, University of Minnesota, Duluth, Duluth, MN;West Publishing Company, Eagan, Minnesota and Department of Computer Science, University of Minnesota, Duluth, Duluth, MN

  • Venue:
  • SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

A well constructed thesaurus has long been recognized as a valuable tool in the effective operation of an information retrieval system. This paper reports the results of experiments designed to determine the validity of an approach to the automatic construction of global thesauri (described originally by Crouch in [1] and [2] based on a clustering of the document collection. The authors validate the approach by showing that the use of thesauri generated by this method results in substantial improvements in retrieval effectiveness in four test collections. The term discrimination value theory, used in the thesaurus generation algorithm to determine a term's membership in a particular thesaurus class, is found not to be useful in distinguishing a “good” from an “indifferent” or “poor” thesaurus class). In conclusion, the authors suggest an alternate approach to automatic thesaurus construction which greatly simplifies the work of producing viable thesaurus classes. Experimental results show that the alternate approach described herein in some cases produces thesauri which are comparable in retrieval effectiveness to those produced by the first method at much lower cost.