Building an application framework for speech and pen input integration in multimodal learning interfaces

  • Authors:
  • R. A. Goubran;C. Wood

  • Affiliations:
  • Interactive Syst. Labs., Carnegie Mellon Univ., Pittsburgh, PA, USA;-

  • Venue:
  • ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 06
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

While significant advances have been made improve speech recognition performance, and gesture and handwriting recognition, speech- and pen-based systems have still not found broad acceptance in everyday life. One reason for this is the inflexibility of each input modality when used alone. Human communication is very natural and flexible because we can take advantage of a multiplicity of communication signals working in concert to supply complementary information or increase robustness with redundancy. We present a multimodal interface capable of jointly interpreting speech, pen-based gestures, and handwriting in the context of an appointment scheduling application. The interpretation engine based on semantic frame merging correctly interprets 80% of a multimodal data set assuming perfect speech and gesture/handwriting recognition; in the presence of recognition errors the interpretation performance is in the range of 35-62%. A dialog processing scheme uses task domain knowledge to guide the user in supplying information and permits human-computer interactions to span several related multimodal input events.