Identification of non-linguistic speech features

Authors:
Jean-Luc Gauvain;Lori F. Lamel
Affiliations:
LIMSI-CNRS, France;LIMSI-CNRS, France
Venue:
HLT '93 Proceedings of the workshop on Human Language Technology
Year:
1993

Citing 6
Cited 1

Bayesian learning of Gaussian mixture densities for hidden Markov models

HLT '91 Proceedings of the workshop on Speech and Natural Language
Bayesian learning for hidden Markov model with Gaussian mixture state observation densities

Speech Communication - Eurospeech '91
MAP estimation of continuous density HMM: theory and applications

HLT '91 Proceedings of the workshop on Speech and Natural Language
Speaker-independent phone recognition using BREF

HLT '91 Proceedings of the workshop on Speech and Natural Language
The design for the wall street journal-based CSR corpus

HLT '91 Proceedings of the workshop on Speech and Natural Language
Cross-lingual experiments with phone recognition

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II

Language identification via large vocabulary speaker independent continuous speech recognition

HLT '94 Proceedings of the workshop on Human Language Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the last decade technological advances have been made which enable us to envision real-world applications of speech technologies. It is possible to foresee applications where the spoken query is to be recognized without even prior knowledge of the language being spoken, for example, information centers in public places such as train stations and airports. Other applications may require accurate identification of the speaker for security reasons, including control of access to confidential information or for telephone-based transactions. Ideally, the speaker's identity can be verified continually during the transaction, in a manner completely transparent to the user. With these views in mind, this paper presents a unified approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods.This technique is shown to be effective for text-independent language, sex, and speaker identification and can enable better and more friendly human-machine interaction. With 2s of speech, the language can be identified with better than 99% accuracy. Error in sex-identification is about 1% on a per-sentence basis, and speaker identification accuracies of 98.5% on TIMIT (168 speakers) and 99.2% on BREF (65 speakers), were obtained with one utterance per speaker, and 100% with 2 utterances for both corpora. An experiment using unsupervised adaptation for speaker identification on the 168 TIMIT speakers had the same identification accuracies obtained with supervised adaptation.