Practical translation pattern acquisition from combined language resources

Authors:
Mihoko Kitamura;Yuji Matsumoto
Affiliations:
Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan;Corporate Research & Development Center, Oki Electric Industry Co., Ltd, Osaka, Japan
Venue:
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Year:
2004

Citing 7
Cited 0

Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Acquisition of phrase-level bilingual correspondence using dependency structure

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Learning translations of named-entity phrases from parallel corpora

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Reliable measures for aligning Japanese-English news articles and sentences

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Learning sequence-to-sequence correspondences from parallel corpora via sequential pattern mining

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic extraction of translation patterns from parallel corpora is an efficient way to automatically develop translation dictionaries, and therefore various approaches have been proposed. This paper presents a practical translation pattern extraction method that greedily extracts translation patterns based on co-occurrence of English and Japanese word sequences, which can also be effectively combined with manual confirmation and linguistic resources, such as chunking information and translation dictionaries. Use of these extra linguistic resources enables it to acquire results of higher precision and broader coverage regardless of the amount of documents.