Voice activity detection using wavelet-based multiresolution spectrum and support vector machines and audio mixing algorithm

Authors:
Wei Xue;Sidan Du;Chengzhi Fang;Yingxian Ye
Affiliations:
Department of Electronics Science and Engineering, Nanjing University, Nanjing, P.R. China;Department of Electronics Science and Engineering, Nanjing University, Nanjing, P.R. China;Department of Electronics Science and Engineering, Nanjing University, Nanjing, P.R. China;Department of Electronics Science and Engineering, Nanjing University, Nanjing, P.R. China
Venue:
ECCV'06 Proceedings of the 2006 international conference on Computer Vision in Human-Computer Interaction
Year:
2006

Citing 6
Cited 0

Ten lectures on wavelets

Ten lectures on wavelets
Fundamentals of speech recognition

Fundamentals of speech recognition
Communication architectures and algorithms for media mixing in multimedia conferences

IEEE/ACM Transactions on Networking (TON)
Support-Vector Networks

Machine Learning
Multipoint communications with speech mixing over IP network

Computer Communications
Content-based audio classification and retrieval by support vector machines

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a Voice Activity Detection (VAD) algorithm and efficient speech mixing algorithm for a multimedia conference. The proposed VAD uses MFCC of multiresolution spectrum based on wavelets and two classical audio parameters as audio feature, and prejudges silence by detection of multi-gate zero cross ratio, and classify noise and voice by Support Vector Machines (SVM). New speech mixing algorithm used in Multipoint Control Unit (MCU) of conferences imposes short-time power of each audio stream as mixing weight vector, and is designed for parallel processing in program. Various experiments show, proposed VAD algorithm achieves overall better performance in all SNRs than VAD of G.729b and other VAD, output audio of new speech mixing algorithm has excellent hearing perceptibility, and its computational time delay are small enough to satisfy the needs of real-time transmission, and MCU computation is lower than that based on G.729b VAD.