Semi-supervised training for statistical word alignment

Authors:
Alexander Fraser;Daniel Marcu
Affiliations:
University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 15
Cited 23

Future paths for integer programming and links to artificial intelligence

Computers and Operations Research - Special issue: Applications of integer programming
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Expectation Maximization for Weakly Labeled Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A probability model to improve word alignment

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Statistical machine translation with word- and sentence-aligned parallel corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Log-linear models for word alignment

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A discriminative matching approach to word alignment

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A discriminative framework for bilingual word alignment

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A maximum entropy word aligner for Arabic-English machine translation

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

Measuring Word Alignment Quality for Statistical Machine Translation

Computational Linguistics
Statistical machine translation

ACM Computing Surveys (CSUR)
Semi-supervised model adaptation for statistical machine translation

Machine Translation
Yawat: yet another word alignment tool

HLT-Demonstrations '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Demo Session
A new objective function for word alignment

ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing
Combination of statistical word alignments based on multiple preprocessing schemes

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Discriminative alignment training without annotated data for machine translation

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Inversion transduction grammar for joint phrasal translation modeling

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Discriminative word alignment by learning the alignment structure and syntactic divergence between a language pair

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Improving alignment for SMT by reordering and augmenting the training corpus

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Two tools for creating and visualizing sub-sentential alignments of parallel text

LAW '07 Proceedings of the Linguistic Annotation Workshop
Discriminative pruning for discriminative ITG alignment

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Discriminative modeling of extraction sets for machine translation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Active learning-based elicitation for semi-supervised word alignment

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Active semi-supervised learning for improving word alignment

ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
Consensus versus expertise: a case study of word alignment with Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
A semi-supervised word alignment algorithm with partial manual alignments

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Improving word alignment by semi-supervised ensemble

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
EMDC: a semi-supervised approach for word alignment

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Improved discriminative ITG alignment using hierarchical phrase pairs and semi-supervised training

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Discriminative word alignment by linear modeling

Computational Linguistics
Multi-task learning for word alignment and dependency parsing

AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part III
Forced derivation tree based model training to statistical machine translation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus. We show that our algorithm leads not only to improved alignments but also to machine translation outputs of higher quality.