The Application of CRFs in Part-of-Speech Tagging

  • Authors:
  • Xiaofei Zhang;Heyan Huang;Zhang Liang

  • Affiliations:
  • -;-;-

  • Venue:
  • IHMSC '09 Proceedings of the 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Conditional random fields (CRFs) for sequence labeling offer advantages over both generative models like Hidden Markov model (HMM) and classifiers applied at each sequence position. First, the CRFs don’t force to adhere to the independence assumption and thus can depend on arbitrary, non-independent features, without accounting for the distribution of those dependencies. Since CRFs models are able to flexibly utilize a wide variety of features, the training data sparse problem can be efficiently resolved. Moreover, the parameter estimation for CRFs is global, which effectively resolve the label bias problem. In this paper, the CRFs with Gaussian prior smoothing is used for Part-of-Speech (POS) tagging. Experiments show that the POS tagging error rate is reduced by 55.17% in close test and 43.64% in open test over HMM-based baseline, and synchronously an accuracy of 98.05% in close test and 95.79% in open test are also achieved. These positive results confirm CRFs superior performance.