Corpus-Based Extraction of Collocations in Chinese

  • Authors:
  • Wang Hui;Ji Donghong

  • Affiliations:
  • -;-

  • Venue:
  • WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Collocation, i.e. the sequences of certain words which habitually co-occur, plays an essential part in human language. The present study is intending to identify the detailed classification and typical features of collocations in Chinese language, and explore a new computer-assistant way for extraction and representation of Chinese collocations. The investigation is based on the largest and only Singapore Chinese corpus (SCC), of which 20 million words have been analysed. The central novel idea of this research is the combination of dictionary, language rules and statistic data in automatic collocation extraction. So far, this method has not been proposed.