An algorithm for high accuracy name pronunciation by parametric speech synthesizer

Authors:
Tony Vitale
Affiliations:
Digital Equipment Corporation
Venue:
Computational Linguistics
Year:
1991

Citing 0
Cited 5

Putting people first: specifying proper names in speech interfaces

UIST '94 Proceedings of the 7th annual ACM symposium on User interface software and technology
A Hybrid Model for the Prediction of the Linguistic Origin of Surnames

IEEE Transactions on Knowledge and Data Engineering
Algorithms for grapheme-phoneme translation for English and French: applications for database searches and speech synthesis

Computational Linguistics
Name pronunciation in German text-to-speech synthesis

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
The effect of lexicon composition in pronunciation by analogy

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic and accurate pronunciation of personal names by parametric speech synthesizer has become a crucial limitation for applications within the telecommunications industry, since the technology is needed to provide new automated services such as reverse directory assistance (number to name).Within text-to-speech technology, however, it was not possible to offer such functionality. This was due to the inability of a text-to-speech device optimized for a specific language (e.g., American English) to accurately pronounce names that originate from very different language families. That is, a telephone book from virtually any section of the country will contain names from scores of languages as diverse as English and Mandarin, French and Japanese, Irish and Polish. All such non-Anglo-Saxon names have traditionally been mispronounced by a speech synthesizer resulting in gross errors and unintelligible speech.This paper describes how an algorithm for high accuracy name pronunciation was implemented in software based on a combination of cryptanalysis, statistics, and linguistics. The algorithm behind the utility is a two-stage procedure: (1) the decoding of the name to determine its etymological grouping; and (2) specific letter-to-sound rules (both segmental rules as well as stress-assignment rules) that provide the synthesizer parameters with sufficient additional information to accurately pronounce the name as would a typical speaker of American English. Default language and thresholds are settable parameters and are also described. While the complexity of the software is invisible to applications writers as well as users, this functionality now makes possible the automation of highly accurate name pronunciation by parametric speech synthesizer.