Concept-based evidential reasoning for multimodal fusion in human-computer interaction

  • Authors:
  • B. Srinath Reddy;Otman A. Basir

  • Affiliations:
  • Electrical and Computer Engineering, University of Waterloo, 200 University Avenue West, Waterloo, ON, Canada N2L 3G1;Electrical and Computer Engineering, University of Waterloo, 200 University Avenue West, Waterloo, ON, Canada N2L 3G1

  • Venue:
  • Applied Soft Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Considerable research has been done on using information from multiple modalities, like hands, facial gestures or speech, for better interaction between humans and computers, and many promising human-computer interfaces (HCI) have been developed in recent years. However, most of the current HCI systems have a few drawbacks: firstly, they are highly dependent on the performance of individual sensors. S econdly, the information fusion process from these sensors tends to ignore the semantic nature of the modalities, which may reinforce or clarify each other over time. Finally, they are not robust enough at representing the imprecise nature of human gestures, since individual gestures are highly ambiguous in themselves. In this paper, we propose an approach for the semantic fusion of different input modalities, based on transferable belief models. We show that this approach allows for a better representation of the ambiguity involved in recognizing gestures. Ambiguity is resolved by combining the beliefs of the individual sensors on the input information, to form new extended concepts, based on a pre-defined domain specific knowledge base, represented by conceptual graphs. We apply this technique to a multimodal system consisting of a hand gesture recognition sensor and a brain computing interface. It is shown that the technique can successfully combine individual gestures obtained from the two sensors, to form meaningful concepts and resolve ambiguity. The advantage of this approach is that it is robust even if one of the sensors is inefficient or has no input. Another important feature is its scalability, wherein more input modalities, like speech or facial gestures, can be easily integrated into the system at minimal cost, to form a comprehensive HCI interface.