Pre-processing closed captions for machine translation

Authors:
Davide Turcato;Fred Popowich;Paul McFetridge;Devlan Nicholson;Janine Toole
Affiliations:
Simon Fraser University, Burnaby, British Columbia, Canada;Simon Fraser University, Burnaby, British Columbia, Canada;Simon Fraser University, Burnaby, British Columbia, Canada;Simon Fraser University, Burnaby, British Columbia, Canada;Simon Fraser University, Burnaby, British Columbia, Canada
Venue:
NAACL-ANLP-EMTS '00 Proceedings of the 2000 NAACL-ANLP Workshop on Embedded machine translation systems - Volume 5
Year:
2000

Citing 4
Cited 0

Input Segmentation of Spontaneous Speech in JANUS: A Speech-to-speech Translation System

ECAI '96 Workshop on Dialogue Processing in Spoken Language Systems
Disambiguation of proper names in text

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Automatic processing of proper names in texts

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Deterministic parsing of syntactic non-fluencies

ACL '83 Proceedings of the 21st annual meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an approach to Machine Translation of transcribed speech, as found in closed captions. We discuss how the colloquial nature and input format peculiarities of closed captions are dealt with in a pre-processing pipeline that prepares the input for effective processing by a core MT system. In particular, we describe components for proper name recognition and input segmentation. We evaluate the contribution of such modules to the system performance. The described methods have been implemented on an MT system for translating English closed captions to Spanish and Portuguese.