Incremental Chinese lexicon extraction with minimal resources on a domain-specific corpus

  • Authors:
  • Gaël Patin

  • Affiliations:
  • National Institute of Oriental Languages and Civilizations (Inalco) and Arisem, Thales Company

  • Venue:
  • COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article presents an original lexical unit extraction system for Chinese. The method is based on an incremental process driven by an association score featuring a minimal resources statistically aided linguistic approach. We also introduce a linguistics-based lexical unit definition and use it to describe an evaluation protocol dedicated to the task. The experimental results on a domain specific corpus show that the method performs better than other approaches. The extraction results, evaluated on a random sample of the working corpus, show a recall of 68.4% and precision of 37.1%.