Automatic construction of parallel English-Chinese corpus for cross-language information retrieval

  • Authors:
  • Jiang Chen;Jian-Yun Nie

  • Affiliations:
  • Université de Montréal, Montreal (Quebec), Canada;Université de Montréal, Montreal (Quebec), Canada

  • Venue:
  • ANLC '00 Proceedings of the sixth conference on Applied natural language processing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

A major obstacle to the construction of a probabilistic translation model is the lack of large parallel corpora. In this paper we first describe a parallel text mining system that finds parallel texts automatically on the Web. The generated Chinese-English parallel corpus is used to train a probabilistic translation model which translates queries for Chinese-English cross-language information retrieval (CLIR). We will discuss some problems in translation model training and show the preliminary CLIR results.