A comparative study on chinese word clustering

Authors:
Bo Wang;Houfeng Wang
Affiliations:
Institute of Computational Linguistics, School of Electronic Engineering and Computer Science, Peking University, Beijing, China;Institute of Computational Linguistics, School of Electronic Engineering and Computer Science, Peking University, Beijing, China
Venue:
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Year:
2006

Citing 5
Cited 0

Algorithms for bigram and trigram word clustering

Speech Communication
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
ABL: alignment-based learning

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
The unsupervised learning of natural language structure

The unsupervised learning of natural language structure
Unsupervised induction of stochastic context-free grammars using distributional clustering

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper evaluates four unsupervised Chinese word clustering methods, respectively maximum mutual information (MMI), function word (FW), high frequent word (HFW), and word cluster (WC). Two evaluation measures, part-of-speech (POS) precision and semantic precision, are employed. Testing results show that MMI reaches the best performance: 79.09% on POS precision and 49.75% on semantic precision, while the other three exceed 51.09% and 29.78% respectively. When applying word clusters generated by the methods mentioned above to the alignment-based automatic Chinese syntactic induction, the performance is further improved.