Analysis of the visual Lombard effect and automatic recognition experiments

Authors:
Panikos Heracleous;Carlos T. Ishi;Miki Sato;Hiroshi Ishiguro;Norihiro Hagita
Affiliations:
ATR, Intelligent Robotics and Communication Laboratories, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto-fu 619-0288, Japan;ATR, Intelligent Robotics and Communication Laboratories, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto-fu 619-0288, Japan;ATR, Intelligent Robotics and Communication Laboratories, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto-fu 619-0288, Japan;ATR, Hiroshi Ishiguro Laboratory, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto-fu 619-0288, Japan;ATR, Intelligent Robotics and Communication Laboratories, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto-fu 619-0288, Japan
Venue:
Computer Speech and Language
Year:
2013

Citing 7
Cited 0

ICARUS: source generator based real-time recognition of speech in noisy stressful and Lombard effect environments

Speech Communication
An analysis of general acoustic-phonetic features for Spanish speech produced with the Lombard effect

Speech Communication - Special issue on speech under stress
Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition

Speech Communication - Special issue on speech under stress
Multi-Modal Temporal Asynchronicity Modeling by Product HMMs for Robust

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Unvoiced speech recognition using tissue-conductive acoustic sensor

EURASIP Journal on Applied Signal Processing
A Comparative Analysis of Noise Robust Speech Features Extracted from All-Pass Based Warping with MFCC in a Noisy Phoneme Recognition

ICDT '08 Proceedings of the 2008 The Third International Conference on Digital Telecommunications
Efficient source adaptivity in independent component analysis

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study focuses on automatic visual speech recognition in the presence of noise. The authors show that, when speech is produced in noisy environments, articulatory changes occur because of the Lombard effect; these changes are both audible and visible. The authors analyze the visual Lombard effect and its role in automatic visual- and audiovisual speech recognition. Experimental results using both English and Japanese data demonstrate the negative effect of the Lombard effect in the visual speech domain. Without considering this factor in designing a lip-reading system, the performance of the system decreases. This is very important in audiovisual speech automatic recognition in real noisy environments. In such a case, however, the recognition rates decrease because of the presence of acoustic noise and because of the Lombard effect. The authors also show that the performance of an audiovisual speech recognizer depends also on the visual Lombard effect and can be further improved when it is considered in designing such a system.