Segmentation of specific speech signals from multi-dialog environment using SVM and wavelet

Authors:
T. K. Truong;Chien-Chang Lin;Shi-Huang Chen
Affiliations:
Department of Information Engineering, I-Shou University, Kaohsiung County 840, Taiwan, ROC;Department of Information Engineering, I-Shou University, Kaohsiung County 840, Taiwan, ROC;Department of Computer Science and Information Engineering, Shu-Te University, Kaohsiung County 824, Taiwan, ROC
Venue:
Pattern Recognition Letters
Year:
2007

Citing 5
Cited 1

Speech/music segmentation using entropy and dynamism features in a HMM classification framework

Speech Communication
On the use of support vector machines for phonetic classification

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Applications of support vector machines to speech recognition

IEEE Transactions on Signal Processing
Automatic speech recognition with an adaptation model motivated by auditory processing

IEEE Transactions on Audio, Speech, and Language Processing
Content-based audio classification and retrieval by support vector machines

IEEE Transactions on Neural Networks

Automatic segmentation of long-term ECG signals corrupted with broadband noise based on sample entropy

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.10

Visualization

Abstract

In this paper, a novel multi-speaker segmentation technique is presented. This technique makes use of wavelets and support vector machines (SVMs) to segment specific speakers' speech signals from multi-dialog environments. The proposed method first applies wavelets to determine the acoustical features such as subband power and pitch information from a given multi-dialog speech data. Then the multi-speaker segmentation of the given multi-dialog speech data can be accomplished by the use of a bottom-up SVM over these acoustical features and additional parameters, such as frequency cepstral coefficients. A public audio database, Aurora-2, is used to evaluate the performances of the proposed method. Experimental results show that the accuracy of multi-speaker segmentation is 100% achieved in the combination of two speakers. And the segmental accuracy can achieve at least 94.12% and 85.93% for 4-speaker and 8-speaker conditions, respectively.