Stochastic inversion transduction grammars with application to segmentation, bracketing, and alignment of parallel corpora

  • Authors:
  • Dekai Wu

  • Affiliations:
  • HKUST, Department of Computer Science, University of Science & Technology, Clear Water Bay, Hong Kong

  • Venue:
  • IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs and (2) the concept of bilingual parsing with potential application to a variety of parallel corpus analysis problems The formalism combines three tactics against the constraints that render finite-state transducers less useful it skips directly to a context-free rather than finite-state base it permits a minimal extra degree of ordering flexibility and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm A convenient normal form is shown to exist and we discuss a number of examples ot how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks.