A finite state and data-oriented method for grapheme to phoneme conversion

  • Authors:
  • Gosse Bouma

  • Affiliations:
  • Alfa-informatica, Rijksuniversiteit Groningen, Groningen, The Netherlands

  • Venue:
  • NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

A finite-state method, based on leftmost longestmatch replacement, is presented for segmenting words into graphemes, and for converting graphemes into phonemes. A small set of hand-crafted conversion rules for Dutch achieves a phoneme accuracy of over 93%. The accuracy of the system is further improved by using transformation-based learning. The phoneme accuracy of the best system (using a large rule and a 'lazy' variant of Brill's algoritm), trained on only 40K words, reaches 99%.