A comparative study on chinese word clustering

  • Authors:
  • Bo Wang;Houfeng Wang

  • Affiliations:
  • Institute of Computational Linguistics, School of Electronic Engineering and Computer Science, Peking University, Beijing, China;Institute of Computational Linguistics, School of Electronic Engineering and Computer Science, Peking University, Beijing, China

  • Venue:
  • ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper evaluates four unsupervised Chinese word clustering methods, respectively maximum mutual information (MMI), function word (FW), high frequent word (HFW), and word cluster (WC). Two evaluation measures, part-of-speech (POS) precision and semantic precision, are employed. Testing results show that MMI reaches the best performance: 79.09% on POS precision and 49.75% on semantic precision, while the other three exceed 51.09% and 29.78% respectively. When applying word clusters generated by the methods mentioned above to the alignment-based automatic Chinese syntactic induction, the performance is further improved.