Are very large n-best lists useful for SMT?

Authors:
Saša Hasan;Richard Zens;Hermann Ney
Affiliations:
RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany
Venue:
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Year:
2007

Citing 4
Cited 5

Finding the k Shortest Paths

SIAM Journal on Computing
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Generation of word graphs in statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Word graphs for statistical machine translation

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

The feature subspace method for SMT system combination

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Improving phrase-based translation with prototypes of short phrases

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Instance selection for machine translation using feature decay algorithms

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
DFKI's SMT system for WMT 2012

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Bagging and Boosting statistical machine translation systems

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an efficient method to extract large n-best lists from a word graph produced by a statistical machine translation system. The extraction is based on the k shortest paths algorithm which is efficient even for very large k. We show that, although we can generate large amounts of distinct translation hypotheses, these numerous candidates are not able to significantly improve overall system performance. We conclude that large n-best lists would benefit from better discriminating models.