Detection of Overlapping Speech in Meetings Using Support Vector Machines and Support Vector Regression

  • Authors:
  • Kiyoshi Yamamoto;Futoshi Asano;Takeshi Yamada;Nobuhiko Kitawaki

  • Affiliations:
  • The authors are with the Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba-shi, 305-8573 Japan. E-mail: kyama-ieice@mmlab.cs.tsukuba.ac.jp,;The author is with Media Interaction Group, Information Technology Research Institute, AIST, Tsukuba-shi, 305-8568 Japan.;The authors are with the Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba-shi, 305-8573 Japan. E-mail: kyama-ieice@mmlab.cs.tsukuba.ac.jp,;The authors are with the Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba-shi, 305-8573 Japan. E-mail: kyama-ieice@mmlab.cs.tsukuba.ac.jp,

  • Venue:
  • IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a method of detecting overlapping speech segments in meetings is proposed. It is known that the eigenvalue distribution of the spatial correlation matrix calculated from a multiple microphone input reflects information on the number and relative power of sound sources. However, in a reverberant sound field, the feature of the number of sources in the eigenvalue distribution is degraded by the room reverberation. In the Support Vector Machines approach, the eigenvalue distribution is classified into two classes (overlapping speech segments and single speech segments). In the Support Vector Regression approach, the relative power of sound sources is estimated by using the eigenvalue distribution, and overlapping speech segments are detected based on the estimated relative power. The salient feature of this approach is that the sensitivity of detecting overlapping speech segments can be controlled simply by changing the threshold value of the relative power. The proposed method was evaluated using recorded data of an actual meeting.