Dialect/Accent Classification Using Unrestricted Audio

Authors:
R. Huang;J. H. L. Hansen;P. Angkititrakul
Affiliations:
Dept. of Electr. Eng., Texas Univ., Richardson, TX;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 4

The physiological microphone (PMIC): A competitive alternative for speaker assessment in stress detection and speaker verification

Speech Communication
Spoken proper name retrieval for limited resource languages using multilingual hybrid representations

IEEE Transactions on Audio, Speech, and Language Processing
Human and computer recognition of regional accents and ethnic groups from British English speech

Computer Speech and Language
Robust and optimum features for persian accent classification using artificial neural network

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study addresses novel advances in English dialect/accent classification. A word-based modeling technique is proposed that is shown to outperform a large vocabulary continuous speech recognition (LVCSR)-based system with significantly less computational costs. The new algorithm, which is named Word-based Dialect Classification (WDC), converts the text-independent decision problem into a text-dependent decision problem and produces multiple combination decisions at the word level rather than making a single decision at the utterance level. The basic WDC algorithm also provides options for further modeling and decision strategy improvement. Two sets of classifiers are employed for WDC: a word classifier DW(k) and an utterance classifier D u. DW(k) is boosted via the AdaBoost algorithm directly in the probability space instead of the traditional feature space. Du is boosted via the dialect dependency information of the words. For a small training corpus, it is difficult to obtain a robust statistical model for each word and each dialect. Therefore, a context adapted training (CAT) algorithm is formulated, which adapts the universal phoneme Gaussian mixture models (GMMs) to dialect-dependent word hidden Markov models (HMMs) via linear regression. Three separate dialect corpora are used in the evaluations that include the Wall Street Journal (American and British English), NATO N4 (British, Canadian, Dutch, and German accent English), and IViE (eight British dialects). Significant improvement in dialect classification is achieved for all corpora tested