Homotopy-based semi-supervised Hidden Markov Models for sequence labeling

  • Authors:
  • Gholamreza Haffari;Anoop Sarkar

  • Affiliations:
  • Simon Fraser University, Burnaby, BC, Canada;Simon Fraser University, Burnaby, BC, Canada

  • Venue:
  • COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper explores the use of the homotopy method for training a semi-supervised Hidden Markov Model (HMM) used for sequence labeling. We provide a novel polynomial-time algorithm to trace the local maximum of the likelihood function for HMMs from full weight on the labeled data to full weight on the unlabeled data. We present an experimental analysis of different techniques for choosing the best balance between labeled and unlabeled data based on the characteristics observed along this path. Furthermore, experimental results on the field segmentation task in information extraction show that the Homotopy-based method significantly outperforms EM-based semi-supervised learning, and provides a more accurate alternative to the use of held-out data to pick the best balance for combining labeled and unlabeled data.