A bootstrapping method for extracting bilingual text pairs

  • Authors:
  • Hiroshi Masuichi;Raymond Flournoy;Stefan Kaufmann;Stanley Peters

  • Affiliations:
  • Fuji Xerox Co., Ltd., Kanagawa, Japan;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA

  • Venue:
  • COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a method for extracting bilingual text pairs from a comparable corpus. The basic idea of the method is to apply bootstrapping to an existing corpus-based cross-language information retrieval (CLIR) approach. We conducted preliminary tests with English and Japanese bilingual corpora. The bootstrapping method led to much better results for the task of extracting translation pairs compared with a corpus-based CLIR method without boot-strapping, and the extracted translation pairs could be useful training data for improving results of the corpus-based CLIR method.