Building a Chinese shallow parsed treebank for collocation extraction

Authors:
Li Baoli;Lu Qin;Li Yin
Affiliations:
Department of Computer Science and Technology, Peking University, Beijing, P.R. China;Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong;Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Venue:
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Year:
2003

Citing 2
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I

Quantified Score

Hi-index	0.00

Visualization

Abstract

To automatically extract Chinese collocations and build a large-scale collocation bank, we are developing a one-million-word Chinese shallow parsed treebank. The treebank can be used not only as a training set for our shallow parser, but also as processed data from which collocations are extracted. This paper presents several issues related to this on-going project, such as our definition of shallow parsing used in Chinese collocation extraction, guideline preparation, and quality control.