Annotating Chinese collocations with multi information

  • Authors:
  • Ruifeng Xu;Qin Lu;Kam-Fai Wong;Wenjie Li

  • Affiliations:
  • The Hong Kong Polytechnic University, Kowloon, Hong Kong;The Hong Kong Polytechnic University, Kowloon, Hong Kong;The Chinese University of Hong Kong, N.T., Hong Kong;The Hong Kong Polytechnic University, Kowloon, Hong Kong

  • Venue:
  • LAW '07 Proceedings of the Linguistic Annotation Workshop
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the design and construction of an annotated Chinese collocation bank as the resource to support systematic research on Chinese collocations. With the help of computational tools, the bi-gram and n-gram collocations corresponding to 3,643 head-words are manually identified. Furthermore, annotations for bi-gram collocations include dependency relation, chunking relation and classification of collocation types. Currently, the collocation bank annotated 23,581 bi-gram collocations and 2,752 n-gram collocations extracted from a 5-million-word corpus. Through statistical analysis on the collocation bank, some characteristics of Chinese bi-gram collocations are examined which is essential to collocation research, especially for Chinese.