A review of speech-based bimodal recognition

Authors:
C. C. Chibelushi;F. Deravi;J. S.D. Mason
Affiliations:
Sch. of Comput., Staffordshire Univ., Stafford;-;-
Venue:
IEEE Transactions on Multimedia
Year:
2002

Citing 0
Cited 38

A singer identification technique for content-based classification of MP3 music objects

Proceedings of the eleventh international conference on Information and knowledge management
Multimodal speaker/speech recognition using lip motion, lip texture and audio

Signal Processing - Special section: Multimodal human-computer interfaces
Audio-visual person authentication using lip-motion from orientation maps

Pattern Recognition Letters
Audio-visual speech processing: progress and challenges

VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Audio-visual speaker verification using continuous fused HMMs

VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Audiovisual speech synchrony measure: application to biometrics

EURASIP Journal on Applied Signal Processing
Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition

IEEE Transactions on Computers
Temporal filtering of visual speech for audio-visual speech recognition in acoustically and visually challenging environments

Proceedings of the 9th international conference on Multimodal interfaces
Multimodal person authentication using speech, face and visual speech

Computer Vision and Image Understanding
Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment

Speech Communication
MIT Lincoln Laboratory Multimodal Person Identification System in the CLEAR 2007 Evaluation

Multimodal Technologies for Perception of Humans
Biometric person authentication with liveness detection based on audio-visual fusion

International Journal of Biometrics
A method towards biometric feature fusion

International Journal of Biometrics
Dynamic visual features for audio-visual speaker verification

Computer Speech and Language
Improving speech recognition on a mobile robot platform through the use of top-down visual queues

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Design and implementation of a lip reading system in smart phone environment

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Feature Fusion Applied to Missing Data ASR with the Combination of Recognizers

Journal of Signal Processing Systems
Automatic visual feature extraction for mandarin audio-visual speech recognition

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Automatic lip contour extraction from color images

Pattern Recognition
Audio-visual speaker identification based on the use of dynamic audio and visual features

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
A Bayesian approach to audio-visual speaker identification

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
Lip biometrics for digit recognition

CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Evolving spiking neural networks for audiovisual information processing

Neural Networks
Hybrid simulated annealing and its application to optimization of hidden Markov models for visual speech recognition

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on gait analysis
Multimedia sensor fusion for retrieving identity in biometric access control systems

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Identity retrieval in biometric access control systems using multimedia fusion

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II
Speech recognition by integrating audio, visual and contextual features based on neural networks

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part II
Speech recognition with multi-modal features based on neural networks

ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Multi-level fusion of audio and visual features for speaker identification

ICB'06 Proceedings of the 2006 international conference on Advances in Biometrics
VALID: a new practical audio-visual database, and comparative results

AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Audio-Visual speaker identification via adaptive fusion using reliability estimates of both modalities

AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Dual-mode decision fusion for fingerprint and finger vein recognition based on image quality evaluation

International Journal of Biometrics
Adaptive Reliability Measure and Optimum Integration Weight for Decision Fusion Audio-visual Speech Recognition

Journal of Signal Processing Systems
Lipreading procedure based on dynamic programming

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
Speaker and digit recognition by audio-visual lip biometrics

ICB'07 Proceedings of the 2007 international conference on Advances in Biometrics
Audio visual person authentication by multiple nearest neighbor classifiers

ICB'07 Proceedings of the 2007 international conference on Advances in Biometrics
Fractional-order embedding canonical correlation analysis and its applications to multi-view dimensionality reduction and recognition

Pattern Recognition
Biometric fusion by simulated annealing

International Journal of Knowledge-based and Intelligent Engineering Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech recognition and speaker recognition by machine are crucial ingredients for many important applications such as natural and flexible human-machine interfaces. Most developments in speech-based automatic recognition have relied on acoustic speech as the sole input signal, disregarding its visual counterpart. However, recognition based on acoustic speech alone can be afflicted with deficiencies that preclude its use in many real-world applications, particularly under adverse conditions. The combination of auditory and visual modalities promises higher recognition accuracy and robustness than can be obtained with a single modality. Multimodal recognition is therefore acknowledged as a vital component of the next generation of spoken language systems. The paper reviews the components of bimodal recognizers, discusses the accuracy of bimodal recognition, and highlights some outstanding research issues as well as possible application domains