Active semi-supervised learning for improving word alignment

Authors:
Vamshi Ambati;Stephan Vogel;Jaime Carbonell
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
Year:
2010

Citing 23
Cited 1

Selective Sampling Using the Query by Committee Algorithm

Machine Learning
A systematic comparison of various statistical alignment models

Computational Linguistics
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Active learning using pre-clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Active learning for statistical natural language parsing

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Sample Selection for Statistical Parsing

Computational Linguistics
Statistical machine translation with word- and sentence-aligned parallel corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Multi-criteria-based active learning for named entity recognition

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Semi-supervised training for statistical word alignment

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Confidence estimation for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Word-Level Confidence Estimation for Machine Translation

Computational Linguistics
Soft syntactic constraints for word alignment through discriminative training

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Boosting statistical word alignment using labeled and unlabeled data

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Measuring Word Alignment Quality for Statistical Machine Translation

Computational Linguistics
Optimizing estimated loss reduction for active sampling in rank learning

Proceedings of the 25th international conference on Machine learning
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Active learning for statistical phrase-based machine translation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Parallel implementations of word alignment tool

SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Meteor: an automatic metric for MT evaluation with high levels of correlation with human judgments

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Confidence measure for word alignment

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Semi-supervised active learning for sequence labeling

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2

A semi-supervised word alignment algorithm with partial manual alignments

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word alignment models form an important part of building statistical machine translation systems. Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial alignments acquired from humans. Such dedicated elicitation effort is often expensive and depends on availability of bilingual speakers for the language-pair. In this paper we study active learning query strategies to carefully identify highly uncertain or most informative alignment links that are proposed under an unsupervised word alignment model. Manual correction of such informative links can then be applied to create a labeled dataset used by a semi-supervised word alignment model. Our experiments show that using active learning leads to maximal reduction of alignment error rates with reduced human effort.