Foundations of statistical natural language processing
Foundations of statistical natural language processing
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
An extensive empirical study of collocation extraction methods
ACLstudent '05 Proceedings of the ACL Student Research Workshop
A multi-stage chinese collocation extraction system
ICMLC'05 Proceedings of the 4th international conference on Advances in Machine Learning and Cybernetics
Hi-index | 0.00 |
This paper presents the design and construction of an annotated Chinese collocation bank as the resource to support systematic research on Chinese collocations. With the help of computational tools, the bi-gram and n-gram collocations corresponding to 3,643 head-words are manually identified. Furthermore, annotations for bi-gram collocations include dependency relation, chunking relation and classification of collocation types. Currently, the collocation bank annotated 23,581 bi-gram collocations and 2,752 n-gram collocations extracted from a 5-million-word corpus. Through statistical analysis on the collocation bank, some characteristics of Chinese bi-gram collocations are examined which is essential to collocation research, especially for Chinese.