Evaluation of a Noise-Robust Multi-Stream Speaker Verification Method Using F0 Information

Authors:
Taichi Asami;Koji Iwano;Sadaoki Furui
Affiliations:
-;-;-
Venue:
IEICE - Transactions on Information and Systems
Year:
2008

Citing 3
Cited 0

A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Robust methods of updating model and a priori threshold in speaker verification

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMMs

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have previously proposed a noise-robust speaker verification method using fundamental frequency (F0) extracted using the Hough transform. The method also incorporates an automatic stream-weight and decision threshold estimation technique. It has been confirmed that the proposed method is effective for white noise at various SNR conditions. This paper evaluates the proposed method in more practical in-car and elevator-hall noise conditions. The paper first describes the noise-robust F0 extraction method and details of our robust speaker verification method using multi-stream HMMs for integrating the extracted F0 and cepstral features. Details of the automatic stream-weight and threshold estimation method for multi-stream speaker verification framework are also explained. This method simultaneously optimizes stream-weights and a decision threshold by combining the linear discriminant analysis (LDA) and the Adaboost technique. Experiments were conducted using Japanese connected digit speech contaminated by white, in-car, or elevator-hall noise at various SNRs. Experimental results show that the F0 features improve the verification performance in various noisy environments, and that our stream-weight and threshold optimization method effectively estimates control parameters so that FARs and FRRs are adjusted to achieve equal error rates (EERs) under various noisy conditions.