Discrete-time signal processing
Discrete-time signal processing
Large-vocabulary speaker-independent continuous speech recognition: the sphinx system
Large-vocabulary speaker-independent continuous speech recognition: the sphinx system
A visually grounded natural language interface for reference to spatial scenes
Proceedings of the 5th international conference on Multimodal interfaces
Statistical models for topic identification using phoneme substrings
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A visually grounded natural language interface for reference to spatial scenes
Proceedings of the 5th international conference on Multimodal interfaces
Proceedings of the 10th international conference on Intelligent user interfaces
Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations
Proceedings of the 8th international conference on Multimodal interfaces
Grounded semantic composition for visual scenes
Journal of Artificial Intelligence Research
PixelTone: a multimodal interface for image editing
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Hi-index | 0.00 |
We present a system that augments any unmodified Java application with an adaptive speech interface. The augmented system learns to associate spoken words and utterances with interface actions such as button clicks. Speech learning is constantly active and searches for correlations between what the user says and does. Training the interface is seamlessly integrated with using the interface. As the user performs normal actions, she may optionally verbally describe what she is doing. By using a phoneme recognizer, the interface is able to quickly learn new speech commands. Speech commands are chosen by the user and can be recognized robustly due to accurate phonetic modelling of the user's utterances and the small size of the vocabulary learned for a single application. After only a few examples, speech commands can replace mouse clicks. In effect, selected interface functions migrate from keyboard and mouse to speech. We demonstrate the usefulness of this approach by augmenting jfig, a drawing application, where speech commands save the user from the distraction of having to use a tool palette.