Foundations of statistical natural language processing
Foundations of statistical natural language processing
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
Hi-index | 0.00 |
To automatically extract Chinese collocations and build a large-scale collocation bank, we are developing a one-million-word Chinese shallow parsed treebank. The treebank can be used not only as a training set for our shallow parser, but also as processed data from which collocations are extracted. This paper presents several issues related to this on-going project, such as our definition of shallow parsing used in Chinese collocation extraction, guideline preparation, and quality control.