Automatic cluster stopping with criterion functions and the gap statistic

  • Authors:
  • Ted Pedersen;Anagha Kulkarni

  • Affiliations:
  • University of Minnesota, Duluth, Duluth, MN;University of Minnesota, Duluth, Duluth, MN

  • Venue:
  • NAACL-Demonstrations '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

SenseClusters is a freely available system that clusters similar contexts. It can be applied to a wide range of problems, although here we focus on word sense and name discrimination. It supports several different measures for automatically determining the number of clusters in which a collection of contexts should be grouped. These can be used to discover the number of senses in which a word is used in a large corpus of text, or the number of entities that share the same name. There are three measures based on clustering criterion functions, and another on the Gap Statistic.