Sound recognition

Authors:
Yang Cai;Károly D. Pados
Affiliations:
Carnegie Mellon University;Carnegie Mellon University
Venue:
Computing with instinct
Year:
2011

Citing 6
Cited 1

The scientist and engineer's guide to digital signal processing

The scientist and engineer's guide to digital signal processing
Automatic Audio Content Analysis

Automatic Audio Content Analysis
Classification of non-speech human sounds: feature selection and snoring sound analysis

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Classification of voice aging using parameters extracted from the glottal signal

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Ambient Intelligence in Everyday Life

Ambient Intelligence in Everyday Life
Online music search by tapping

Ambient Intelligence in Everyday Life

Personalized behavior pattern recognition and unusual event detection for mobile users

Mobile Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sound recognition has been a primitive survival instinct of early mammals for over 120 million years. In the modern era, it is the most affordable sensory channel for us. Here we explore an auditory vigilance algorithm for detecting background sounds such as explosion, gunshot, screaming, and human voice. We introduce a general algorithm for sound feature extraction, classification and feedback. We use Hamming window for tapering sound signals and the short-term Fourier transform (STFT) and Principal Component Analysis (PCA) for feature extraction. We then apply a Gaussian Mixture Model (GMM) for classification; and we use the feedback from the confusion matrix of the training classifier to redefine the sound classes for better representation, accuracy and compression. We found that the frequency coefficients in a logarithmic scale yield better results versus those in linear representations in background sound recognition. However, the magnitude of the sound samples in a logarithmic scale yields worse results versus those in linear representations. We also compare our results to that of the linear frequency model and the Melscale Frequency Cepstral Coefficients (MFCC)-based algorithms. We conclude that our algorithm reaches a higher accuracy with available training data. We foresee broader applications of the sound recognition method, including video triage, healthcare, robotics and security.