Extracting word sequence correspondences with support vector machines

  • Authors:
  • Kengo Sato;Hiroaki Saito

  • Affiliations:
  • Keio University, Yokohama, Japan;Keio University, Yokohama, Japan

  • Venue:
  • COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a learning and extracting method of word sequence correspondences from non-aligned parallel corpora with Support Vector Machines, which have high ability of the generalization, rarely cause over-fit for training samples and can learn dependencies of features by using a kernel function. Our method uses features for the translation model which use the translation dictionary, the number of words, part-of-speech, constituent words and neighbor words. Experiment results in which Japanese and English parallel corpora are used archived 81.1% precision rate and 69.0% recall rate of the extracted word sequence correspondences. This demonstrates that our method could reduce the cost for making translation dictionaries.