Boosting Speech/Non-speech Classification Using Averaged Mel-Frequency Cepstrum Coefficients Features

Authors:
Ziyou Xiong;Thomas S. Huang
Affiliations:
-;-
Venue:
PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Year:
2002

Citing 7
Cited 3

A theory of the learnable

Communications of the ACM
Fundamentals of speech recognition

Fundamentals of speech recognition
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
A robust audio classification and segmentation method

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Machine Learning

Machine Learning
Content-Based Classification, Search, and Retrieval of Audio

IEEE MultiMedia
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2

Robust speaker verification in air traffic control using improved voice activity detection

SPPR'07 Proceedings of the Fourth conference on IASTED International Conference: Signal Processing, Pattern Recognition, and Applications
Voicing Detection in Noisy Speech Signal

ICISP '08 Proceedings of the 3rd international conference on Image and Signal Processing
Robust speaker verification in air traffic control using improved voice activity detection

SPPRA '07 Proceedings of the Fourth IASTED International Conference on Signal Processing, Pattern Recognition, and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

AdaBoost is used to boost and select the best sequence of weak classifiers for the speech/non-speech classification. These weak classifiers are chosen the simple threshold functions. Statistical mean and variance of the Mel-frequency Cepstrum Coefficients(MFCC) over all overlapping frames of an audio file are used as audio features. Training and testing on a database of 410 audio files have shown asymptotic classification improvement by AdaBoost. A classification accuracy of 99.51% has been achieved on the test data. A comparison of AdaBoost with Nearest Neighbor and Nearest Center classifiers is also given.