Speech confusion index (Φ): A confusion-based speech quality indicator and recognition rate prediction for dysarthria

Authors:
Prakasith Kayasith;Thanaruk Theeramunkong
Affiliations:
Sirindhorn International Institute of Technology, Thammasat University, 160 Moo 5, Tivanond Road, Bangkadi, Muang, Pathumthani 12000, Thailand and National Electronics and Computer Technology Cent ...;Sirindhorn International Institute of Technology, Thammasat University, 160 Moo 5, Tivanond Road, Bangkadi, Muang, Pathumthani 12000, Thailand
Venue:
Computers & Mathematics with Applications
Year:
2009

Citing 5
Cited 0

Introduction to algorithms

Introduction to algorithms
Dysarthric speech characteristics of Thai stroke patients assessed by the computerized articulation test

Proceedings of the 1st international convention on Rehabilitation engineering & assistive technology: in conjunction with 1st Tan Tock Seng Hospital Neurorehabilitation Meeting
Recognition rate prediction for dysarthric speech disorder via speech consistency score

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Speech confusion index (Ø): a recognition rate indicator for dysarthric speakers

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Development of a voice-input voice-output communication aid (VIVOCA) for people with severe dysarthria

ICCHP'06 Proceedings of the 10th international conference on Computers Helping People with Special Needs

Quantified Score

Hi-index	0.09

Visualization

Abstract

This paper presents an automated method to help us assess the speech quality of a dysarthric speaker, in place of laborious and subjective manual methods. The assessment result can be used as a good indicator for predicting the accuracy of speech recognition. The so-called speech confusion index (@F) is proposed to measure the speech disorder severity of a speaker in terms of how easily his/her speech signal may be misrecognized to other unintended words. Based on signal processing without any high-level information, the dynamic-time-warping technique incorporated with adaptive slope constraint and accumulative mismatch score is used to measure a distance between any two speech signals of a same word or two different words. Compared to the articulatory and intelligibility tests, the proposed indicator was shown to have more predictability on the recognition rates obtained from the Hidden Markov Model (HMM) and Artificial Neural Networks (ANN). Based on three evaluation criteria, namely root-mean-square difference, correlation coefficient and rank-order inconsistency, the experimental results on a phoneme-balance set showed that @F achieved better prediction than both articulatory and intelligibility tests. Another experiment on a reduced training set is made to investigate the robustness of the proposed indicator. Finally, a detailed analysis of speech confusion is done at the phoneme level.