Chinese-Japanese clause alignment

Authors:
Xiaojie Wang;Fuji Ren
Affiliations:
School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, China;Department of Information Science & Intelligent Systems, Tokushima University, Tokushima, Japan
Venue:
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2005

Citing 10
Cited 2

Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Structural matching of parallel texts

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Learning translation templates from bilingual text

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
An unsupervised method for word sense tagging using parallel corpora

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Effective phrase translation extraction from alignment models

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Automatic learning of parallel dependency treelet pairs

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Chinese Ancient-Modern Sentence Alignment

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Probabilistic neural network based english-arabic sentence alignment

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.03

Visualization

Abstract

Bi-text alignment is useful to many Natural Language Processing tasks such as machine translation, bilingual lexicography and word sense disambiguation. This paper presents a Chinese-Japanese alignment at the level of clause. After describing some characteristics in Chinese-Japanese bilingual texts, we first investigate some statistical properties of Chinese-Japanese bilingual corpus, including the correlation test of text lengths between two languages and the distribution test of length ratio data. We then pay more attention to n-m(n1 or m1) alignment modes which are prone to mismatch. We propose a similarity measure based on Hanzi characters information for these kinds of alignment modes. By using dynamic programming, we combine statistical information and Hanzi character information to find the overall least cost in aligning. Experiments show our algorithm can achieve good alignment accuracy.