Computational Linguistics - Special issue on using large corpora: I
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora
Computational Linguistics
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Structural matching of parallel texts
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Learning translation templates from bilingual text
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
An unsupervised method for word sense tagging using parallel corpora
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Effective phrase translation extraction from alignment models
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Automatic learning of parallel dependency treelet pairs
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Chinese Ancient-Modern Sentence Alignment
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Probabilistic neural network based english-arabic sentence alignment
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.03 |
Bi-text alignment is useful to many Natural Language Processing tasks such as machine translation, bilingual lexicography and word sense disambiguation. This paper presents a Chinese-Japanese alignment at the level of clause. After describing some characteristics in Chinese-Japanese bilingual texts, we first investigate some statistical properties of Chinese-Japanese bilingual corpus, including the correlation test of text lengths between two languages and the distribution test of length ratio data. We then pay more attention to n-m(n1 or m1) alignment modes which are prone to mismatch. We propose a similarity measure based on Hanzi characters information for these kinds of alignment modes. By using dynamic programming, we combine statistical information and Hanzi character information to find the overall least cost in aligning. Experiments show our algorithm can achieve good alignment accuracy.