Long-time span acoustic activity analysis from far-field sensors in smart homes

Authors:
Jing Huang;Xiaodan Zhuang;Vit Libal;Gerasimos Potamianos
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.;IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.;IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.;IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A.
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 3

Multimodal Classification of Activities of Daily Living Inside Smart Homes

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
Real-world acoustic event detection

Pattern Recognition Letters
Audio visual speech recognition in noisy visual environments

Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

Smart homes for the aging population have recently started attracting the attention of the research community. One of the problems of interest is this of monitoring the activities of daily living (ADLs) of the elderly, in order to help identify critical problems, aiming to improve their protection and general well-being. In this paper, we report on our initial attempts to recognize such activities, based on input from networks of far-field microphones distributed inside the home. We propose two approaches to the problem: The first models the entire activity, which typically covers long time spans, with a single statistical model, for example a hidden Markov model (HMM), a Gaussian mixture model (GMM), or GMM super-vectors in conjunction with support vector machines (SVMs). The second is a two-step approach: It first performs acoustic event detection (AED) to locate distinctive events, characteristic of the ADLs, and it is subsequently followed by a post-processing stage that employs activity-specific language models (LMs) to classify the output sequences of detected events into ADLs. Experiments are reported on a corpus containing a small number of acted ADLs, collected as part of the Netcarity Integrated Project inside a two-room smart home. Our results show that SVM GMM supervector modeling improves six-class ADL classification accuracy to 76%, compared to 56% achieved by the GMMs, while also outperforming HMMs by 8% absolute. Preliminary results from LM scoring of acoustic event sequences are comparable to those from GMMs on a three-class ADL classification task.