Main dialect identification in Mainland China, Hong Kong and Taiwan

Authors:
Dunxiao Wei;Jun-Yong Zhu;Wei-Shi Zheng;Jianhuang Lai
Affiliations:
School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, P.R. China;School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, P.R. China;School of Information Science and Technology, Sun Yat-Sen University, Guangzhou, P.R. China;School of Information Science and Technology, Sun Yat-Sen University, Guangzhou, P.R. China
Venue:
CCBR'11 Proceedings of the 6th Chinese conference on Biometric recognition
Year:
2011

Citing 4
Cited 0

Elements of information theory

Elements of information theory
Statistical methods for speech recognition

Statistical methods for speech recognition
Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Dialect Classification via Text-Independent Training and Testing for Arabic, Spanish, and Chinese

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As an emerging field of speech recognition, dialect identification plays an important role for promoting applications of speech recognition technology. Since the communications among Mainland China, Hong Kong and Taiwan are becoming frequently, it is particularly necessary to identify their dialects. This paper makes contributions to this issue in the following threefolds: 1) we build a speech corpus for main dialects of the three areas; 2) we use the popular GMM based method to extensively evaluate the main dialects between Mainland China and Hong Kong and the ones between Mainland China and Taiwan, and we find the differences between Mainland China Mandarin and Taiwan Mandarin are much smaller than those between Mandarin and Cantonese, resulting in unsatisfactory results in the latter case; 3) we propose an improved method based on the analysis of GMM, namely, maximum KL distance based Gaussian component selection (MKLD-GCS) in order to improve the performance of dialect identification between Mainland China Mandarin and Taiwan Mandarin. Experimental results show that our proposed method obtains better identification performance than related methods.