Evaluating the intelligibility benefit of speech modifications in known noise conditions

Authors:
Martin Cooke;Catherine Mayo;Cassia Valentini-Botinhao;Yannis Stylianou;Bastian Sauert;Yan Tang
Affiliations:
Ikerbasque (Basque Science Foundation), Bilbao, Spain and Language and Speech Laboratory, Universidad del Pais Vasco, Vitoria, Spain;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;ICS-FORTH, Institute for Computer Science, Crete, Greece;Institute of Communication Systems and Data Processing, RWTH Aachen University, Aachen, Germany;Language and Speech Laboratory, Universidad del Pais Vasco, Vitoria, Spain
Venue:
Speech Communication
Year:
2013

Citing 8
Cited 1

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Do you speak E-NG-L-I-SH? A comparison of foreigner- and infant-directed speech

Speech Communication
Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

IEICE - Transactions on Information and Systems
Review: Statistical parametric speech synthesis

Speech Communication
Robust speaker-adaptive HMM-based text-to-speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Prediction of speech intelligibility based on an auditory preprocessing model

Speech Communication
Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

IEEE Transactions on Audio, Speech, and Language Processing

The listening talker: A review of human and algorithmic context-induced modifications of speech

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of live and recorded speech is widespread in applications where correct message reception is important. Furthermore, the deployment of synthetic speech in such applications is growing. Modifications to natural and synthetic speech have therefore been proposed which aim at improving intelligibility in noise. The current study compares the benefits of speech modification algorithms in a large-scale speech intelligibility evaluation and quantifies the equivalent intensity change, defined as the amount in decibels that unmodified speech would need to be adjusted by in order to achieve the same intelligibility as modified speech. Listeners identified keywords in phonetically-balanced sentences representing ten different types of speech: plain and Lombard speech, five types of modified speech, and three forms of synthetic speech. Sentences were masked by either a stationary or a competing speech masker. Modification methods varied in the manner and degree to which they exploited estimates of the masking noise. The best-performing modifications led to equivalent intensity changes of around 5dB in moderate and high noise levels for the stationary masker, and 3-4dB in the presence of competing speech. These gains exceed those produced by Lombard speech. Synthetic speech in noise was always less intelligible than plain natural speech, but modified synthetic speech reduced this deficit by a significant amount.