Term Clustering Using a Corpus-Based Similarity Measure

  • Authors:
  • Goran Nenadic;Irena Spasic;Sophia Ananiadou

  • Affiliations:
  • -;-;-

  • Venue:
  • TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a method for the automatic term clustering. The method uses a hybrid similarity measure to cluster terms automatically extracted from a corpus by applying the C/NC-value method. The measure comprises contextual, functional and lexical similarity, and it is used to instantiate the cell values in a similarity matrix. The clustering algorithm uses either the nearest neighbour or the Ward's method to calculate the distance between clusters. The approach has been tested and evaluated in the domain of molecular biology and the results are presented.