Automatic acquisition of names using speak and spell mode in spoken dialogue systems

Authors:
Grace Chung;Stephanie Seneff;Chao Wang
Affiliations:
Corporation for National Research Initiatives, Reston, VA;MIT Laboratory for Computer Science, Cambridge, MA;MIT Laboratory for Computer Science, Cambridge, MA
Venue:
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Year:
2003

Citing 3
Cited 6

Reversible letter-to-sound/sound-to-letter generation based on parsing word morphology

Speech Communication
Efficient multilingual phoneme-to-grapheme conversion based on HMM

Computational Linguistics
A multistrategy approach to improving pronunciation by analogy

Computational Linguistics

ISIS: an adaptive, trilingual conversational system with interleaving interaction and delegation dialogs

ACM Transactions on Computer-Human Interaction (TOCHI)
Multimodal new vocabulary recognition through speech and handwriting in a whiteboard scheduling application

Proceedings of the 10th international conference on Intelligent user interfaces
Flexible and personalizable mixed-initiative dialogue systems

HLT-NAACL-DIALOGUE '03 Proceedings of the HLT-NAACL 2003 workshop on Research directions in dialogue processing - Volume 7
Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations

Proceedings of the 8th international conference on Multimodal interfaces
Reversible sound-to-letter/letter-to-sound modeling based on syllable structure

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Correcting phoneme recognition errors in learning word pronunciation through speech interaction

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a novel multi-stage recognition procedure for deducing the spelling and pronunciation of an open set of names. The overall goal is the automatic acquisition of unknown words in a human computer conversational system. The names are spoken and spelled in a single utterance, achieving a concise and natural dialogue flow. The first recognition pass extracts letter hypotheses from the spelled part of the waveform and maps them to phonemic hypotheses via a hierarchical sublexical model capable of generating graphemephoneme mappings. A second recognition pass determines the name by combining information from the spoken and spelled part of the waveform, augmented with language model constraints. The procedure is integrated into a spoken dialogue system where users are asked to enroll their names for the first time. The acquisition process is implemented in multiple parallel threads for real-time operation. Subsequent to inducing the spelling and pronunciation of a new name, a series of operations automatically updates the recognition and natural language systems to immediately accommodate the new word. Experiments show promising results for letter and phoneme accuracies on a preliminary dataset.