Identification of non-linguistic speech features

  • Authors:
  • Jean-Luc Gauvain;Lori F. Lamel

  • Affiliations:
  • LIMSI-CNRS, France;LIMSI-CNRS, France

  • Venue:
  • HLT '93 Proceedings of the workshop on Human Language Technology
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

Over the last decade technological advances have been made which enable us to envision real-world applications of speech technologies. It is possible to foresee applications where the spoken query is to be recognized without even prior knowledge of the language being spoken, for example, information centers in public places such as train stations and airports. Other applications may require accurate identification of the speaker for security reasons, including control of access to confidential information or for telephone-based transactions. Ideally, the speaker's identity can be verified continually during the transaction, in a manner completely transparent to the user. With these views in mind, this paper presents a unified approach to identifying non-linguistic speech features from the recorded signal using phone-based acoustic likelihoods.This technique is shown to be effective for text-independent language, sex, and speaker identification and can enable better and more friendly human-machine interaction. With 2s of speech, the language can be identified with better than 99% accuracy. Error in sex-identification is about 1% on a per-sentence basis, and speaker identification accuracies of 98.5% on TIMIT (168 speakers) and 99.2% on BREF (65 speakers), were obtained with one utterance per speaker, and 100% with 2 utterances for both corpora. An experiment using unsupervised adaptation for speaker identification on the 168 TIMIT speakers had the same identification accuracies obtained with supervised adaptation.