Speech intelligibility from image processing

Authors:
Andrew Hines;Naomi Harte
Affiliations:
Department of Electronic and Electrical Engineering, Sigmedia Group, Trinity College Dublin, Ireland;Department of Electronic and Electrical Engineering, Sigmedia Group, Trinity College Dublin, Ireland
Venue:
Speech Communication
Year:
2010

Citing 2
Cited 1

Phoneme-group specific octave-band weights in predicting speech intelligibility

Speech Communication
Image quality assessment: from error visibility to structural similarity

IEEE Transactions on Image Processing

Speech intelligibility prediction using a Neurogram Similarity Index Measure

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hearing loss research has traditionally been based on perceptual criteria, speech intelligibility and threshold levels. The development of computational models of the auditory periphery has allowed experimentation via simulation to provide quantitative, repeatable results at a more granular level than would be practical with clinical research on human subjects. The responses of the model used in this study have been previously shown to be consistent with a wide range of physiological data from both normal and impaired ears for stimuli presentation levels spanning the dynamic range of hearing. The model output can be assessed by examination of the spectro-temporal output visualised as neurograms. The effect of sensorineural hearing loss (SNHL) on phonemic structure was evaluated in this study using two types of neurograms: temporal fine structure (TFS) and average discharge rate or temporal envelope. A new systematic way of assessing phonemic degradation is proposed using the outputs of an auditory nerve model for a range of SNHLs. The mean structured similarity index (MSSIM) is an objective measure originally developed to assess perceptual image quality. The measure is adapted here for use in measuring the phonemic degradation in neurograms derived from impaired auditory nerve outputs. A full evaluation of the choice of parameters for the metric is presented using a large amount of natural human speech. The metric's boundedness and the results for TFS neurograms indicate it is a superior metric to standard point to point metrics of relative mean absolute error and relative mean squared error. MSSIM as an indicative score of intelligibility is also promising, with results similar to those of the standard speech intelligibility index metric.