Semantic Clustering for a Functional Text Classification Task

  • Authors:
  • Thomas Lippincott;Rebecca Passonneau

  • Affiliations:
  • Department of Computer Science Center for Computational Learning Systems, Columbia University, New York, NY, USA;Department of Computer Science Center for Computational Learning Systems, Columbia University, New York, NY, USA

  • Venue:
  • CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a semantic clustering method designed to address shortcomings in the common bag-of-words document representation for functional semantic classification tasks. The method uses WordNet-based distance metrics to construct a similarity matrix, and expectation maximization to find and represent clusters of semantically-related terms. Using these clusters as features for machine learning helps maintain performance across distinct, domain-specific vocabularies while reducing the size of the document representation. We present promising results along these lines, and evaluate several algorithms and parameters that influence machine learning performance. We discuss limitations of the study and future work for optimizing and evaluating the method.