Algorithms for bigram and trigram word clustering
Speech Communication
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
The unsupervised learning of natural language structure
The unsupervised learning of natural language structure
Unsupervised induction of stochastic context-free grammars using distributional clustering
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Hi-index | 0.00 |
This paper evaluates four unsupervised Chinese word clustering methods, respectively maximum mutual information (MMI), function word (FW), high frequent word (HFW), and word cluster (WC). Two evaluation measures, part-of-speech (POS) precision and semantic precision, are employed. Testing results show that MMI reaches the best performance: 79.09% on POS precision and 49.75% on semantic precision, while the other three exceed 51.09% and 29.78% respectively. When applying word clusters generated by the methods mentioned above to the alignment-based automatic Chinese syntactic induction, the performance is further improved.