Automatic acquisition of names using speak and spell mode in spoken dialogue systems

  • Authors:
  • Grace Chung;Stephanie Seneff;Chao Wang

  • Affiliations:
  • Corporation for National Research Initiatives, Reston, VA;MIT Laboratory for Computer Science, Cambridge, MA;MIT Laboratory for Computer Science, Cambridge, MA

  • Venue:
  • NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a novel multi-stage recognition procedure for deducing the spelling and pronunciation of an open set of names. The overall goal is the automatic acquisition of unknown words in a human computer conversational system. The names are spoken and spelled in a single utterance, achieving a concise and natural dialogue flow. The first recognition pass extracts letter hypotheses from the spelled part of the waveform and maps them to phonemic hypotheses via a hierarchical sublexical model capable of generating graphemephoneme mappings. A second recognition pass determines the name by combining information from the spoken and spelled part of the waveform, augmented with language model constraints. The procedure is integrated into a spoken dialogue system where users are asked to enroll their names for the first time. The acquisition process is implemented in multiple parallel threads for real-time operation. Subsequent to inducing the spelling and pronunciation of a new name, a series of operations automatically updates the recognition and natural language systems to immediately accommodate the new word. Experiments show promising results for letter and phoneme accuracies on a preliminary dataset.