Extracting word correspondences from bilingual corpora based on word co-occurrences information

  • Authors:
  • Hiroyuki Kaji;Toshiko Aizono

  • Affiliations:
  • Central Research Laboratory, Hitachi Ltd., Tokyo, Japan;Central Research Laboratory, Hitachi Ltd., Tokyo, Japan

  • Venue:
  • COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new method has been developed for extracting word correspondences from a bilingual corpus. First, the co-occurrence information for each word in both languages is extracted from the corpus. Then, the correlations between the co-occurrence features of the words are calculated pairwisely with the assistance of a basic word bilingual dictionary. Finally, the pairs of words with the highest correlations are output selectively. This method is applicable to rather small, unaligned corpora; it can extract correspondences between compound words as well as simple words. An experiment using bilingual patent-specification corpora achieved 28% recall and 76% precision; this demonstrates that the method effectively reduces the cost of bilingual dictionary augmenntation.