Phrasal: a toolkit for statistical machine translation with facilities for extraction and incorporation of arbitrary model features

  • Authors:
  • Daniel Cer;Michel Galley;Daniel Jurafsky;Christopher D. Manning

  • Affiliations:
  • Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA

  • Venue:
  • HLT-DEMO '10 Proceedings of the NAACL HLT 2010 Demonstration Session
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a new Java-based open source toolkit for phrase-based machine translation. The key innovation provided by the toolkit is to use APIs for integrating new features (/knowledge sources) into the decoding model and for extracting feature statistics from aligned bitexts. The package includes a number of useful features written to these APIs including features for hierarchical reordering, discriminatively trained linear distortion, and syntax based language models. Other useful utilities packaged with the toolkit include: a conditional phrase extraction system that builds a phrase table just for a specific dataset; and an implementation of MERT that allows for pluggable evaluation metrics for both training and evaluation with built in support for a variety of metrics (e.g., TERp, BLEU, METEOR).