Fusion and fission: improved MMIA for multi-modal HCI based on WPS and voice-XML

  • Authors:
  • Jung-Hyun Kim;Kwang-Seok Hong

  • Affiliations:
  • School of Information and Communication Engineering, Sungkyunkwan University, Suwon, KyungKi-do, Korea;School of Information and Communication Engineering, Sungkyunkwan University, Suwon, KyungKi-do, Korea

  • Venue:
  • ICOST'07 Proceedings of the 5th international conference on Smart homes and health telematics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper implements the Multi-Modal Instruction Agent (hereinafter, MMIA) including a synchronization between audio-gesture modalities, and suggests improved fusion and fission rules depending on SNNR (Signal Plus Noise to Noise Ratio) and fuzzy value, based on the embedded KSSL (Korean Standard Sign Language) recognizer using the WPS (Wearable Personal Station) and Voice-XML. Our approach fuses and recognizes the sentence and word-based instruction models that are represented by speech and KSSL, and then translates recognition result that is fissioned according to a weight decision rule into synthetic speech and visual illustration (graphical display by HMD-Head Mounted Display) in real-time. In order to insure the validity of our approach, we evaluate performance with the average recognition rates and the recognition time of MMIA. In the experimental results, the average recognition rates of the MMIA for the prescribed 65 sentential and 156 word instruction models were 94.33% and 96.85% in clean environments, and 92.29% and 92.91% were shown in noisy environments. In addition, the average recognition time is approximately 0.36 ms in given both environments.