Building a bilingual dictionary from a Japanese-Chinese patent corpus

Authors:
Keiji Yasuda;Eiichiro Sumita
Affiliations:
National Institute of Information and Communications Technology, Keihanna Science City, Kyoto, Japan;National Institute of Information and Communications Technology, Keihanna Science City, Kyoto, Japan
Venue:
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Year:
2013

Citing 5
Cited 0

Extended models and tools for high-performance part-of-speech tagger

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
HHMM-based Chinese lexical analyzer ICTCLAS

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Design of the moses decoder for statistical machine translation

SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing
An unsupervised model for joint phrase alignment and extraction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Building a Japanese-Chinese dictionary using kanji/hanzi conversion

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose an automatic method to build a bilingual dictionary from a Japanese-Chinese parallel corpus. The proposed method uses character similarity between Japanese and Chinese, and a statistical machine translation (SMT) framework in a cascading manner. The first step extracts word translation pairs from the parallel corpus based on similarity between Japanese kanji characters (Chinese characters used in Japanese writing) and simplified Chinese characters. The second step trains phrase tables using 2 different SMT training tools, then extracts common word translation pairs. The third step trains an SMT system using the word translation pairs obtained by the first and the second steps. According to the experimental results, the proposed method yields 59.3% to 92.1% accuracy in the word translation pairs extracted, depending on the cascading step.