Automatic speech recognition and speech variability: A review

Authors:
M. Benzeghiba;R. De Mori;O. Deroo;Stephane Dupont;T. Erbes;D. Jouvet;L. Fissore;P. Laface;A. Mertins;C. Ris;R. Rose;V. Tyagi;C. Wellekens
Affiliations:
Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium;Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium;Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium;Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium;Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium;Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium;Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium;Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium;Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium;Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium;Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium;Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium;Multitel, Parc Initialis, Avenue Copernic, B-7000 Mons, Belgium
Venue:
Speech Communication
Year:
2007

Citing 55
Cited 22

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
The BBN BYBLOS Continuous Speech Recognition system

HLT '89 Proceedings of the workshop on Speech and Natural Language
Adaptive filter theory (2nd ed.)

Adaptive filter theory (2nd ed.)
Adaptation to a speaker's voice in a speech recognition system based on synthetic phoneme references

Speech Communication - Special issue on speaker characterization in speech terminology
Fundamentals of speech recognition

Fundamentals of speech recognition
Language accent classification in American English

Speech Communication
Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition

Speech Communication - Special issue on speech under stress
Speech recognition by machines and humans

Speech Communication
Effects of phase on the perception of intervocalic stop consonants

Speech Communication
Towards improving ASR robustness for PSN and GSM telephone applications

Speech Communication - Special issue on interactive voice technology for telecommunication applications (IVITA '96)
Robust speech recognition using the modulation spectrogram

Speech Communication - Special issue on robust speech recognition
Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition

Speech Communication
Pronunciation variants across system configuration, language and speaking style

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Effects of speaking rate and word frequency on pronunciations in conversational speech

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Modeling pronunciation variation for ASR: a survey of the literature

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Automatic scoring of pronunciation quality

Speech Communication
Phone-level pronunciation scoring and assessment for interactive language learning

Speech Communication
Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms

Speech Communication
Combination of machine scores for automatic grading of pronunciation quality

Speech Communication
Recognition of syllables in a tone language

Speech Communication
Multilingual speech recognition in seven languages

Speech Communication
Recognizing speech of goats, wolves, sheep and...non-natives

Speech Communication
Xenophones: an investigation of phone set expansion in Swedish and implications for speech recognition and speech synthesis

Speech Communication
Describing the emotional states that are expressed in speech

Speech Communication - Special issue on speech and emotion
Vocal communication of emotion: a review of research paradigms

Speech Communication - Special issue on speech and emotion
Confidence Measures for Spontaneous Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Modelling Asynchrony in Speech Using Elementary Single-Signal Decomposition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Subband-Based Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Sub-Band Based Recognition of Noisy Speech

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Multi-Resolution Phonetic/Segmental Features and Models for HMM-Based Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Extensions to Phone-State Decision-Tree Clustering: Single Tree and Tagged Clustering

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Generalized Mixture of HMMs for Continuous Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Speaker Normalization Based on Frequency Warping

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Interaction between the native and second language phonetic subsystems

Speech Communication
Error-responsive feedback mechanisms for speech recognizers

Error-responsive feedback mechanisms for speech recognizers
An acoustic-phonetic and articulatory study of speech-speaker dichotomy

An acoustic-phonetic and articulatory study of speech-speaker dichotomy
Linear Dimensionality Reduction via a Heteroscedastic Extension of LDA: The Chernoff Criterion

IEEE Transactions on Pattern Analysis and Machine Intelligence
On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
High-accuracy connected digit recognition for mobile applications

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Towards robustness to fast speech in ASR

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A parametric approach to vocal tract length normalization

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A study of speech recognition for children and the elderly

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Speaker normalization using efficient frequency warping procedures

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Speaker clustering and transformation for speaker adaptation in large-vocabulary speech recognition systems

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Maximum a posteriori adaptation for large scale HMM recognizers

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Acoustic adaptation using nonlinear transformations of HMM parameters

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Fast accent identification and accented speech recognition

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
On the limits of speech recognition in noise

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Towards a robust/fast continuous speech recognition system using a voiced-unvoiced decision

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Connected digit recognition using short and long duration models

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Maximum likelihood discriminant feature spaces

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Multiple resolution analysis for robust automatic speech recognition

Computer Speech and Language
A study on speaker adaptation of the parameters of continuousdensity hidden Markov models

IEEE Transactions on Signal Processing
Entropy-based algorithms for best basis selection

IEEE Transactions on Information Theory - Part 2
Speaker normalization and adaptation using second-order connectionist networks

IEEE Transactions on Neural Networks

Invited paper: Automatic speech recognition: History, methods and challenges

Pattern Recognition
Towards an investigation of speech energetics using 'AnTon': an animatronic model of a human tongue and vocal tract

Connection Science - Language and Robots
Towards a neurocomputational model of speech production and perception

Speech Communication
Adaptive audio-based context recognition

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
On pole-zero model estimation methods minimizing a logarithmic criterion for speech analysis

IEEE Transactions on Audio, Speech, and Language Processing
Optimizing automatic speech recognition for low-proficient non-native speakers

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
The physiological microphone (PMIC): A competitive alternative for speaker assessment in stress detection and speaker verification

Speech Communication
Application of RBF network based on immune algorithm in human speaker recognition

ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
Multi-environment model adaptation based on vector Taylor series for robust speech recognition

Pattern Recognition
A study on invariance of f-divergence and its application to speech recognition

IEEE Transactions on Signal Processing
Exploring the effect of differences in the acoustic correlates of adults' and children's speech in the context of automatic speech recognition

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Contextual invariant-integration features for improved speaker-independent speech recognition

Speech Communication
Cross-word Arabic pronunciation variation modeling for speech recognition

International Journal of Speech Technology
Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition

International Journal of Speech Technology
Robust features for speaker-independent speech recognition based on a certain class of translation-invariant transformations

NOLISP'09 Proceedings of the 2009 international conference on Advances in Nonlinear Speech Processing
Investigation of broadcast-audio semantic analysis scenarios employing radio-programme-adaptive pattern classification

Speech Communication
Can users live with overconfident or unconfident systems?: a comparison of artificial subtle expressions with human-like expression

CHI '12 Extended Abstracts on Human Factors in Computing Systems
Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

International Journal of Speech Technology
Fast unsupervised adaptation based on efficient statistics accumulation using frame independent confidence within monophone states

Computer Speech and Language
Effect of aging on speech features and phoneme recognition: a study on Bengali voicing vowels

International Journal of Speech Technology
Pertinent Prosodic Features for Speaker Identification by Voice

International Journal of Mobile Computing and Multimedia Communications
Cry-based classification of healthy and sick infants using adapted boosting mixture learning method for gaussian mixture models

Modelling and Simulation in Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Major progress is being recorded regularly on both the technology and exploitation of automatic speech recognition (ASR) and spoken language systems. However, there are still technological barriers to flexible solutions and user satisfaction under some circumstances. This is related to several factors, such as the sensitivity to the environment (background noise), or the weak representation of grammatical and semantic knowledge. Current research is also emphasizing deficiencies in dealing with variation naturally present in speech. For instance, the lack of robustness to foreign accents precludes the use by specific populations. Also, some applications, like directory assistance, particularly stress the core recognition technology due to the very high active vocabulary (application perplexity). There are actually many factors affecting the speech realization: regional, sociolinguistic, or related to the environment or the speaker herself. These create a wide range of variations that may not be modeled correctly (speaker, gender, speaking rate, vocal effort, regional accent, speaking style, non-stationarity, etc.), especially when resources for system training are scarce. This paper outlines current advances related to these topics.