Identification of transliterated foreign words in Hebrew script

  • Authors:
  • Yoav Goldberg;Michael Elhadad

  • Affiliations:
  • Computer Science Department, Ben Gurion University of the Negev, Be'er Sheva, Israel;Computer Science Department, Ben Gurion University of the Negev, Be'er Sheva, Israel

  • Venue:
  • CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a loosely-supervised method for context-free identification of transliterated foreign names and borrowed words in Hebrew text. The method is purely statistical and does not require the use of any lexicons or linguistic analysis tool for the source languages (Hebrew, in our case). It also does not require any manually annotated data for training - we learn from noisy data acquired by over-generation. We report precision/ recall results of 80/82 for a corpus of 4044 unique words, containing 368 foreign words.