Period disambiguation with maxent model

  • Authors:
  • Chunyu Kit;Xiaoyue Liu

  • Affiliations:
  • Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong;Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong

  • Venue:
  • IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents our recent work on period disambiguation, the kernel problem in sentence boundary identification, with the maximum entropy (Maxent) model. A number of experiments are conducted on PTB-II WSJ corpus for the investigation of how context window, feature space and lexical information such as abbreviated and sentence-initial words affect the learning performance. Such lexical information can be automatically acquired from a training corpus by a learner. Our experimental results show that extending the feature space to integrate these two kinds of lexical information can eliminate 93.52% of the remaining errors from the baseline Maxent model, achieving an F-score of 99.8227%.