Segmentation of specific speech signals from multi-dialog environment using SVM and wavelet

  • Authors:
  • T. K. Truong;Chien-Chang Lin;Shi-Huang Chen

  • Affiliations:
  • Department of Information Engineering, I-Shou University, Kaohsiung County 840, Taiwan, ROC;Department of Information Engineering, I-Shou University, Kaohsiung County 840, Taiwan, ROC;Department of Computer Science and Information Engineering, Shu-Te University, Kaohsiung County 824, Taiwan, ROC

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2007

Quantified Score

Hi-index 0.10

Visualization

Abstract

In this paper, a novel multi-speaker segmentation technique is presented. This technique makes use of wavelets and support vector machines (SVMs) to segment specific speakers' speech signals from multi-dialog environments. The proposed method first applies wavelets to determine the acoustical features such as subband power and pitch information from a given multi-dialog speech data. Then the multi-speaker segmentation of the given multi-dialog speech data can be accomplished by the use of a bottom-up SVM over these acoustical features and additional parameters, such as frequency cepstral coefficients. A public audio database, Aurora-2, is used to evaluate the performances of the proposed method. Experimental results show that the accuracy of multi-speaker segmentation is 100% achieved in the combination of two speakers. And the segmental accuracy can achieve at least 94.12% and 85.93% for 4-speaker and 8-speaker conditions, respectively.