Environmental sound recognition with time-frequency audio features

Authors:
Selina Chu;Shrikanth Narayanan;C.-C. Jay Kuo
Affiliations:
Department of Computer Science, Signal and Image Processing Institute, University of Southern California, Los Angeles, CA;Ming Hsieh Department of Electrical Engineering, Department of Computer Science, Signal and Image Processing Institute, University of Southern California, Los Angeles, CA;Ming Hsieh Department of Electrical Engineering, Department of Computer Science, Signal and Image Processing Institute, University of Southern California, Los Angeles, CA
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 19
Cited 11

Fundamentals of speech recognition

Fundamentals of speech recognition
Wavelets and subband coding

Wavelets and subband coding
Atomic Decomposition by Basis Pursuit

SIAM Journal on Scientific Computing
Linear Prediction of Speech

Linear Prediction of Speech
Automatic Classification of Drum Sounds: A Comparison of Feature Selection Methods and Classification Techniques

ICMAI '02 Proceedings of the Second International Conference on Music and Artificial Intelligence
Wheelesley: A Robotic Wheelchair System: Indoor Navigation and User Interface

Assistive Technology and Artificial Intelligence, Applications in Robotics, User Interfaces and Natural Language Processing
Prediction-driven computational auditory scene analysis

Prediction-driven computational auditory scene analysis
Minimal-impact audio-based personal archives

Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences
A comparison of features for speech, music discrimination

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Speech/music discrimination for multimedia applications

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Musical instrument timbres classification with spectral features

EURASIP Journal on Applied Signal Processing
Classification of acoustic emissions using modified matching pursuit

EURASIP Journal on Applied Signal Processing
Harmonic decomposition of audio signals with matching pursuit

IEEE Transactions on Signal Processing
Fast matching pursuit with a multiscale dictionary of Gaussianchirps

IEEE Transactions on Signal Processing
Audio-based context recognition

IEEE Transactions on Audio, Speech, and Language Processing
A flexible framework for key audio effects detection and auditory context inference

IEEE Transactions on Audio, Speech, and Language Processing
Collaborative context determination to support mobile terminal applications

IEEE Wireless Communications
Multigroup classification of audio signals using time-frequency parameters

IEEE Transactions on Multimedia
Very low bit-rate video coding based on matching pursuits

IEEE Transactions on Circuits and Systems for Video Technology

Sparse coding for drum sound classification and its use as a similarity measure

Proceedings of 3rd international workshop on Machine learning and music
Ecological acoustics perspective for content-based retrieval of environmental sounds

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on environmental sound synthesis, processing, and retrieval
Indoor localization without infrastructure using the acoustic background spectrum

MobiSys '11 Proceedings of the 9th international conference on Mobile systems, applications, and services
Environmental sound recognition for robot audition using matching-pursuit

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part II
Audio based solutions for detecting intruders in wild areas

Signal Processing
Environmental sound classification for scene recognition using local discriminant bases and HMM

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Environmental sounds classification based on visual features

CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
NMF-based environmental sound source separation using time-variant gain features

Computers & Mathematics with Applications
Environmental sound recognition by measuring significant changes in the spectral entropy

MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
Multiclass support vector machines for environmental sounds classification in visual domain based on log-Gabor filters

International Journal of Speech Technology
Towards scalable activity recognition: adapting zero-effort crowdsourced acoustic models

Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper considers the task of recognizing environmental sounds for the understanding of a scene or context surrounding an audio sensor. A variety of features have been proposed for audio recognition, including the popular Mel-frequency cepstral coefficients (MFCCs) which describe the audio spectral shape. Environmental sounds, such as chirpings of insects and sounds of rain which are typically noise-like with a broad flat spectrum, may include strong temporal domain signatures. However, only few temporal-domain features have been developed to characterize such diverse audio signals previously. Here, we perform an empirical feature analysis for audio environment characterization and propose to use the matching pursuit (MP) algorithm to obtain effective time-frequency features. The MP-based method utilizes a dictionary of atoms for feature selection, resulting in a flexible, intuitive and physically interpretable set of features. The MP-based feature is adopted to supplement the MFCC features to yield higher recognition accuracy for environmental sounds. Extensive experiments are conducted to demonstrate the effectiveness of these joint features for unstructured environmental sound classification, including listening tests to study human recognition capabilities. Our recognition system has shown to produce comparable performance as human listeners.