Speaker, accent, and language identification using multilingual phone strings

Authors:
Tanja Schultz;Qin Jin;Kornel Laskowski;Alicia Tribble;Alex Waibel
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Year:
2002

Citing 1
Cited 4

Language-independent and language-adaptive acoustic modeling for speech recognition

Speech Communication

Improvements in non-verbal cue identification using multilingual phone strings

S2S '02 Proceedings of the ACL-02 workshop on Speech-to-speech translation: algorithms and systems - Volume 7
Speaker Characteristics

Speaker Classification I
Language and variety verification on broadcast news for Portuguese

Speech Communication
Advances on the use of the foreign language recognizer

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we investigated the identification of non-verbal cues from spoken speech, namely speaker, accent, and language. For these tasks, a joint framework is developed which uses phone strings, derived from different language phone recognizers, as intermediate features and which performs classification decisions based on their perplexities. Our evaluation on variable distance data proved the robustness of the approach, achieving a 96.7% speaker identification rate. Furthermore, we achieved 93.7% accent discrimination accuracy between native and non-native speakers. For language identification, we obtained 95.5% classification accuracy for utterances 5 seconds in length and up to 99.89% on longer utterances. The experiments were carried out in a language independent nature, on languages not presented to the phone recognizers for training, suggesting that they could be successfully ported to non-verbal cue classification in other languages.