Spoken Arabic dialect identification using phonotactic modeling

Authors:
Fadi Biadsy;Julia Hirschberg;Nizar Habash
Affiliations:
Columbia University, New York;Columbia University, New York;Columbia University, New York
Venue:
Semitic '09 Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages
Year:
2009

Citing 2
Cited 5

Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Improving the Arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Word-based dialect identification with georeferenced rules

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Word segmentation for dialect translation

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
The Arabic online commentary dataset: an annotated dataset of informal Arabic with high dialectal content

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Human and computer recognition of regional accents and ethnic groups from British English speech

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Arabic language is a collection of multiple variants, among which Modern Standard Arabic (MSA) has a special status as the formal written standard language of the media, culture and education across the Arab world. The other variants are informal spoken dialects that are the media of communication for daily life. Arabic dialects differ substantially from MSA and each other in terms of phonology, morphology, lexical choice and syntax. In this paper, we describe a system that automatically identifies the Arabic dialect (Gulf, Iraqi, Levantine, Egyptian and MSA) of a speaker given a sample of his/her speech. The phonotactic approach we use proves to be effective in identifying these dialects with considerable overall accuracy --- 81.60% using 30s test utterances.