Chinese Ancient-Modern Sentence Alignment

  • Authors:
  • Zhun Lin;Xiaojie Wang

  • Affiliations:
  • School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China;School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China

  • Venue:
  • ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Bi-text alignment is useful to many Natural Language Processing tasks such as machine translation, bilingual lexicography and word sense disambiguation. Most of previous researches are on different language pairs. This paper presents a diachronic alignment of Ancient and Modern Chinese. Because of the long history of Chinese culture and Chinese writing, lots of Ancient Chinese texts are waiting to be translated into modern Chinese, especially, the comparative study of Ancient and Modern Chinese is a very important way to understand some characteristics in Modern Chinese. After describing some characteristics in Ancient-Modern Chinese bi-texts, we first investigate some statistical properties of Ancient-Modern bi-text corpus, including the correlation test of text lengths between two languages and the distribution test of length ratio data. We then pay more attention to n-m(n1 or m1) alignment modes which are prone to mismatch.