WEB-derived pronunciations

  • Authors:
  • Arnab Ghoshal;Martin Jansche;Sanjeev Khudanpur;Michael Riley;Morgan Ulinski

  • Affiliations:
  • Johns Hopkins University, Baltimore, MD 21218, USA;Google, Inc., New York, 10011, USA;Johns Hopkins University, Baltimore, MD 21218, USA;Google, Inc., New York, 10011, USA;Cornell University, Ithaca, NY 14853, USA

  • Venue:
  • ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pronunciation information is available in large quantities on the Web, in the form of IPA and ad-hoc transcriptions. We describe techniques for extracting candidate pronunciations from Web pages and associating them with orthographic words, filtering out poorly extracted pronunciations, normalizing IPA pronunciations to better conform to a common transcription standard, and generating phonemic from ad-hoc transcriptions. We show improvements on a letter-to-phoneme task when using web-derived vs. Pronlex pronunciations.