A combination of data mining method with context-based state transfer for speech/music discrimination

Authors:
Qin Yan;Qiong Wu;Haojiang Deng;Jinlin Wang
Affiliations:
School of Information and Engineering, Hohai University, Nanjing, China;Institute of Acoustics, Chinese Academy of Sciences, Beijing, China;Institute of Acoustics, Chinese Academy of Sciences, Beijing, China;Institute of Acoustics, Chinese Academy of Sciences, Beijing, China
Venue:
WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Year:
2009

Citing 6
Cited 0

Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Hierarchical classification of audio data for archiving and retrieving

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Speech/music discrimination for multimedia applications

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Protein Disordered Region Prediction by SVM with Post-Processing

CISIS '08 Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems
A system for induction of oblique decision trees

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In our previous work [1], a Speech/Music classifier is proposed on the basis of the feature subset selection (FSS) tool and oblique decision tree induced by the algorithm OC1. In this paper, we endeavor to improve it by State transfer (ST) strategy whose aim is to refine the classification results, according to the fact that adjacent segments in one audio file have strong relevance to each other. The proposed algorithm is evaluated by a set of 5-to-11-minute 504 audio files of different types of speech and music in three Signal-to-Noise Ratio (SNR) levels: 30dB, 20dB and 10dB. The results show that ST strategy averagely improves the accuracy for music by 3.3% at 10 dB and 2.3% at 20 dB while keeping accuracy rate of speech almost unchanged. The speech classification rate is also lifted by 5.7% at 10dB on average.