Corpus-based annotated test set for machine translation evaluation by an industrial user

Authors:
Eva Dauphin;Véronika Lux
Affiliations:
AEROSPATIALE-CCR, Suresnes, France;AEROSPATIALE-CCR, Suresnes, France
Venue:
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Year:
1996

Citing 4
Cited 1

Guidelines for Electronic Text Encoding and Interchange: Volumes 1 and 2: P4

Guidelines for Electronic Text Encoding and Interchange: Volumes 1 and 2: P4
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Using test suites in evaluation of machine translation systems

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 2

Rapid development of translation tools: application to Persian and Turkish

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article is concerned with the building of a test data set for assisting the industrial user in machine translation evaluation. The emphasis is laid on the interest of an approach based on the study of bilingual corpus pragmatic characteristics. The study of one chapter of the maintenance manual of the Super Puma helicopter made it possible to identify the pragmatic characteristics relevant in the choice of the morpho-syntactic structures and translation processes actually used. The textual test set consists in a SGML file including the source text sequences aligned with the reference translation sequences and also including the pragmatic, formal and translational characteristics in the form of annotations (labels and formal descriptions).