High level knowledge sources in usable speech recognition systems
Communications of the ACM
Understanding spontaneous speech
HLT '89 Proceedings of the workshop on Speech and Natural Language
The CMU air travel information service: understanding spontaneous speech
HLT '90 Proceedings of the workshop on Speech and Natural Language
Automatic Speech Recognition: The Development of the Sphinx Recognition System
Automatic Speech Recognition: The Development of the Sphinx Recognition System
Modelling non-verbal sounds for speech recognition
HLT '89 Proceedings of the workshop on Speech and Natural Language
Automatic detection of new words in a large vocabulary continuous speech recognition system
HLT '89 Proceedings of the workshop on Speech and Natural Language
Towards speech recognition without vocabulary-specific training
HLT '89 Proceedings of the workshop on Speech and Natural Language
Development of the INRS ATIS system
IUI '93 Proceedings of the 1st international conference on Intelligent user interfaces
The Application of Semantic Classification Trees to Natural Language Understanding
IEEE Transactions on Pattern Analysis and Machine Intelligence
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Multi-site data collection for a spoken language corpus
HLT '91 Proceedings of the workshop on Speech and Natural Language
Speech understanding in open tasks
HLT '91 Proceedings of the workshop on Speech and Natural Language
Automatic detection and correction of repairs in human-computer dialog
HLT '91 Proceedings of the workshop on Speech and Natural Language
Introduction to digital speech processing
Foundations and Trends in Signal Processing
Flexible use of semantic constraints in speech recognition
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Hi-index | 0.00 |
The CMU Phoenix system is an experiment in understanding spontaneous speech. It has been implemented for the Air Travel Information Service task. In this task, casual users are asked to obtain information from a database of air travel information. Users are not given a vocabulary, grammar or set of sentences to read. They compose queries themselves in a spontaneous manner. This task presents speech recognizers with many new problems compared to the Resource Management task. Not only is the speech not fluent, but the vocabulary and grammar are open. Also, the task is not just to produce a transcription, but to produce an action, retrieve data from the database. Taking such actions requires parsing and "understanding" the utterance. Word error rate is not as important as utterance understanding rate.Phoenix attempts to deal with phenomena that occur in spontaneous speech. Unknown words, restarts, repeats, and poorly formed or unusual grammar are common is spontaneous speech and are very disruptive to standard recognizers. These events lead to misrecognitions which often cause a total parse failure. Our strategy is to apply grammatical constraints at the phrase level and to use semantic rather than lexical grammars. Semantics provide more constraint than parts of speech and must ultimately be delt with in order to take actions. Applying constraints at the phrase level is more flexible than recognizing sentences as a whole while providing much more constraint than word-spotting. Restarts and repeats are most often between phase occurences, so individual phrases can still be recognized correctly. Poorly constructed grammar often consists of well-formed phrases, and is often semantically well-formed. It is only syntactically incorrect. We associate phrases by frame-based semantics. Phrases represent word strings that can fill slots in frames. The slots represent information which the frame is able to act on.The current Phoenix system uses a bigram language model with the Sphinx speech recognition system. The top-scoring word string is passed to a flexible frame-based parser. The parser assigns phrases (word strings) from the input to slots in frames. The slots represent information content needed for the frame. A beam of frame hypotheses is produced and the best scoring one is used to produce an SQL query.