Constructing of a large-scale Chinese-English parallel corpus

  • Authors:
  • Le Sun;Song Xue;Weimin Qu;Xiaofeng Wang;Yufang Sun

  • Affiliations:
  • Institute of Software, Beijing, P. R. China;Institute of Software, Beijing, P. R. China;Institute of Software, Beijing, P. R. China;Institute of Software, Beijing, P. R. China;Institute of Software, Beijing, P. R. China

  • Venue:
  • COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the constructing of a large-scale (above 500,000 pair sentences) Chinese-English parallel corpus. The current status of Chinese corpora is overviewed with the emphasis on parallel corpus. The XML coding principles for Chinese--English parallel corpus are discussed. The sentence alignment algorithm used in this project is described with a computer-aided checking processing. Finally, we show the design of the concordance of the parallel corpus and the prospect to further development.