Automatic language recognition using acoustic features

Authors:
M. Sugiyama
Affiliations:
ATR Interpreting Telephony Res. Inst., Kyoto, Japan
Venue:
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Year:
1991

Citing 0
Cited 6

A phonotactic language model for spoken language identification

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A target-oriented phonotactic front-end for spoken language recognition

IEEE Transactions on Audio, Speech, and Language Processing
Automatic language identification using Gaussian mixture and hidden Markov models

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
A new speech recognition method based on VQ-distortion measure and HMM

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Beat space segmentation and octave scale cepstral feature for sung language recognition in pop music

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Universal attribute characterization of spoken languages for automatic spoken language recognition

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two language recognition algorithms are proposed and some experimental results are described. While many studies have been done concerning the speech recognition problem, few studies have addressed the language recognition task. The speech data used contains 20 languages: 16 sentences uttered twice by 4 males and 4 females. The duration of each sentence is about 8 seconds. The first algorithm is based on the standard vector quantization (VQ) technique. Every language is characterized by its own VQ codebook. The second algorithm is based on a single universal (common) VQ codebook for all languages, and its occurrence probability histograms. Every language is characterized by a histogram. The experiment results show that the recognition rates for the first and second algorithms were 65% and 80%, respectively, each using just 8 sentences of unknown speech (about 64 seconds).