A combination of data mining method with decision trees building for Speech/Music discrimination

Authors:
Qiong Wu;Qin Yan;Haojiang Deng;Jinlin Wang
Affiliations:
Institute of Acoustics, Chinese Academy of Sciences, Beijing, China;School of Computer and Information Engineering College, Hohai University, Nanjing, China;Institute of Acoustics, Chinese Academy of Sciences, Beijing, China;Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
Venue:
Computer Speech and Language
Year:
2010

Citing 7
Cited 0

Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Real-time discrimination of broadcast speech/music

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Hierarchical classification of audio data for archiving and retrieving

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Speech/music discrimination for multimedia applications

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
A system for induction of oblique decision trees

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays the applications in multimedia domain require that the Speech/Music classifier has many other merits in addition to the accuracy, such as short-time delay and low complexity. Here, we endeavor to form a Speech/Music classifier by using different data mining methods. The main contributions of this paper are to obtain a system by analyzing the inherent validity of diverse features extracted from the audio, building a hierarchical structure of oblique decision trees (HODT) to maintain optimal performances, and applying a novel context-based state transform (ST) strategy to refine the classification results. The proposed algorithm is evaluated by a set of 5-11min 702 audio files, which are made from 54 speech or music files according to different Signal-to-Noise Ratio (SNR) levels and diverse noise types. The experiment results show that our proposed classifier outperforms AMR-WB+ by achieving 97.9% and 95.9% in classification rate at the 10ms frame level in pure and high SNR (=20dB) environment, respectively. The post-processing ST strategy further enhances the system performance, particularly at low SNR circumstances (10dB), with 5.6% up in the accuracy rate. In addition, the complexity of the proposed system is lower than 1WMOPS which make it easily adaptable to many scenarios.