A multimodal people recognition system for an intelligent environment

  • Authors:
  • Salvatore M. Anzalone;Emanuele Menegatti;Enrico Pagello;Rosario Sorbello;Yuichiro Yoshikawa;Hiroshi Ishiguro

  • Affiliations:
  • Intelligent Autonomous Systems Laboratory, Dep. of Information Engineering, Faculty of Engineering, Padua University, Padua, Italy;Intelligent Autonomous Systems Laboratory, Dep. of Information Engineering, Faculty of Engineering, Padua University, Padua, Italy;Intelligent Autonomous Systems Laboratory, Dep. of Information Engineering, Faculty of Engineering, Padua University, Padua, Italy;Robotic Laboratory, Dep. of Computer Science, Faculty of Engineering, University of Palermo, Palermo, Italy;Intelligent Robotics Laboratory, Dep. of Systems Innovation, Graduate School of Engineering Science, Osaka University, Osaka, Japan;Intelligent Robotics Laboratory, Dep. of Systems Innovation, Graduate School of Engineering Science, Osaka University, Osaka, Japan

  • Venue:
  • AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a multimodal system for recognizing people in intelligent environments is presented. Users are identified and tracked by detecting and recognizing voices and faces through cameras and microphones spread around the environment. This multimodal approach has been chosen to develop a flexible and cheap though reliable system, implemented through consumer electronics. Voice features are extracted through a short time spectrum analysis, while face features are extracted using the eigenfaces technique. The recognition task is achieved through the use of some Support Vector Machines, one per modality, that learn and classify the features of each person, while bindings between modalities are also learnt through a cross-anchoring learning rule based on the mutual exclusivity selection principle. The system has been developed using NMM, a middleware software capable of splitting the sensors processing in several software nodes, making the system scalable in the number of cameras and microphones.