Using bilingual ETD collections to mine phrase translations

  • Authors:
  • Ryan Richardson;Edward A. Fox

  • Affiliations:
  • Virginia Tech, Blacksburg, VA;Virginia Tech, Blacksburg, VA

  • Venue:
  • Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Phrase translation lists can enhance cross-language information retrieval. However, finding translations for technical phrases is difficult. Bilingual dictionaries have limited coverage for specialized fields, and even more limited coverage of technical phrases. Since phrases can have very specific meanings in technical fields, this limits the quality of translations produced by generic machine translation systems. We hypothesize that digital libraries of electronic theses and dissertations (ETDs) are a good source of technical phrase translations. We have acquired a collection of 3,086 Spanish ETDs about computer science from Scirus, the Universidad Nacional Autónoma de México (Mexico City), and Universidad de las Américas (Puebla). By using English ETDs from NDLTD, we have a comparable corpus of computing-related documents from which to mine phrase translations. We describe our method and its formative evaluation.