Concatenated phoneme models for text-variable speaker recognition

Authors:
Tomoko Matsui;Sadaoki Furui
Affiliations:
NTT Human Interface Laboratories, Musashino-Shi, Tokyo, Japan;NTT Human Interface Laboratories, Musashino-Shi, Tokyo, Japan
Venue:
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Year:
1993

Citing 1
Cited 3

Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System

A multi-resolution multi-classifier system for speaker verification

Expert Systems: The Journal of Knowledge Engineering
Constrained temporal structure for text-dependent speaker verification

Digital Signal Processing
Detection of speaker individual information using a phoneme effect suppression method

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates methods that create models to specify both speaker and phonetic information accurately by using only a small amount of training data for each speaker. For a text-dependent speaker recognition method, in which arbitrary key texts are prompted from the recognizer, speaker-specific phoneme models are necessary to identify the key text and recognize the speaker. Two methods of making speaker-specific phoneme models are discussed: phoneme-adaptation of a phoneme-independent speaker model and speaker-adaptation of universal phoneme models. Moreover, we also investigate supplementing these methods by adding a phoneme-independent speaker model to make up for the lack of speaker information. This combination achieves a rejection rate as high as 98.5% for speech that differs from the key text and a speaker verification rate of 100.0%.