A machine learning parser using an unlexicalized distituent model

  • Authors:
  • Samuel W. K. Chan;Lawrence Y. L. Cheung;Mickey W. C. Chong

  • Affiliations:
  • Dept. of Decision Sciences, Chinese University of Hong Kong, Shatin, Hong Kong SAR;Dept. of Decision Sciences, Chinese University of Hong Kong, Shatin, Hong Kong SAR;Dept. of Decision Sciences, Chinese University of Hong Kong, Shatin, Hong Kong SAR

  • Venue:
  • CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Despite the popularity of lexicalized parsing models, practical concerns such as data sparseness and applicability to domains of different vocabularies make unlexicalized models that do not refer to word tokens themselves deserve more attention. A classifier-based parser using an unlexicalized parsing model has been developed. Most importantly, to enhance the accuracy of these tasks, we investigated the notion of distituency (the possibility that two parts of speech cannot remain in the same constituent or phrase) and incorporated it as attributes using various statistic measures. A machine learning method integrates linguistic attributes and information-theoretic attributes in two tasks, namely sentence chunking and phrase recognition. The parser was applied to parsing English and Chinese sentences in the Penn Treebank and the Tsinghua Chinese Treebank. It achieved a parsing performance of F-Score 80.3% in English and 82.4% in Chinese.