Improving word alignment by semi-supervised ensemble

  • Authors:
  • Shujian Huang;Kangxi Li;Xinyu Dai;Jiajun Chen

  • Affiliations:
  • Nanjing University, Nanjing, P.R. China;Nanjing University, Nanjing, P.R. China;Nanjing University, Nanjing, P.R. China;Nanjing University, Nanjing, P.R. China

  • Venue:
  • CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Supervised learning has been recently used to improve the performance of word alignment. However, due to the limited amount of labeled data, the performance of "pure" supervised learning, which only used labeled data, is limited. As a result, many existing methods employ features learnt from a large amount of unlabeled data to assist the task. In this paper, we propose a semi-supervised ensemble method to better incorporate both labeled and unlabeled data during learning. Firstly, we employ an ensemble learning framework, which effectively uses alignment results from different unsupervised alignment models. We then propose to use a semi-supervised learning method, namely Tri-training, to train classifiers using both labeled and unlabeled data collaboratively and further improve the result. Experimental results show that our methods can substantially improve the quality of word alignment. The final translation quality of a phrase-based translation system is slightly improved, as well.