Robust voice activity detection for social sensing

Authors:
Sebastian Feese;Gerhard Tröster
Affiliations:
ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland
Venue:
Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication
Year:
2013

Citing 6
Cited 0

Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems

Speech Communication - Special issue on speech processing in adverse conditions
Online dictionary learning for sparse coding

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
SoundSense: scalable sound sensing for people-centric applications on mobile phones

Proceedings of the 7th international conference on Mobile systems, applications, and services
EmotionSense: a mobile phones based adaptive platform for experimental social psychology research

Proceedings of the 12th ACM international conference on Ubiquitous computing
SpeakerSense: energy efficient unobtrusive speaker identification on mobile phones

Pervasive'11 Proceedings of the 9th international conference on Pervasive computing
Robust Voice Activity Detection Using Long-Term Signal Variability

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The speech modality is a rich source of personal information. As such, speech detection is a fundamental function of many social sensing applications. Simply the amount of speech present in our surroundings can give indications about our socialbility and communication patterns. In this work, we present and evaluate a speech detection approach utilizing dictionary learning and sparse signal representation. Transforming the noisy audio data to the sparse representation with a dictionary learned from clean speech data, we show that speech and non speech can be discriminated even in low signal-to-noise conditions with up to 92% accuracy. In addition to an evaluation with simulated data, we evaluate the algorithm on a real-world data set recorded during firefighting missions. We show, that speech activity of firefighters can be detected with 85% accuracy when using a smartphone that was placed in the firefighting jacket.