Adaptive Bayesian HMM for Fully Unsupervised Chinese Part-of-Speech Induction

  • Authors:
  • Lidan Zhang;Kwop-Ping Chan

  • Affiliations:
  • The University of Hong Kong;The University of Hong Kong

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose an adaptive Bayesian hidden Markov model for fully unsupervised part-of-speech (POS) induction. The proposed model with its inference algorithm has two extensions to the first-order Bayesian HMM with Dirichlet priors. First our algorithm infers the optimal number of hidden states from the training corpus rather than fixes the dimensionality of state space beforehand. The second extension studies the Chinese unknown word processing module which measures similarities from both morphological properties and context distribution. Experimental results showed that both of these two extensions can help to find the optimal categories for Chinese in terms of both unsupervised clustering metrics and grammar induction accuracies on the Chinese Treebank.