Letter level learning for language independent diacritics restoration

  • Authors:
  • Rada Mihalcea;Vivi Nastase

  • Affiliations:
  • University of North Texas, Denton, TX;University of Ottawa, Ottawa, ON

  • Venue:
  • COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a method for diacritics restoration based on learning mechanisms that act at letter level. The method requires no additional tagging tools or resources other than raw text, which makes it independent of the language, and particularly appealing for languages for which there are few resources available. The algorithm was evaluated on four different languages, namely Czech, Hungarian, Polish and Romanian, and an average accuracy of over 98% was observed.