Multi channel sequence processing

Authors:
Samy Bengio;Hervé Bourlard
Affiliations:
IDIAP Research Institute, Martigny, Switzerland;IDIAP Research Institute, Martigny, Switzerland
Venue:
Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning
Year:
2004

Citing 12
Cited 1

Fundamentals of speech recognition

Fundamentals of speech recognition
Indexing and retrieval of broadcast news

Speech Communication - Special issue on accessing information in spoken audio
Multi-stream adaptive evidence combination for noise robust ASR

Speech Communication - Special issue on noise robust ASR
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
The M2VTS Multimodal Face Database (Release 1.00)

AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
Subband-Based Speech Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Modeling Individual and Group Actions in Meetings: A Two-Layer HMM Framework

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 7 - Volume 07
Automatic Analysis of Multimodal Group Actions in Meetings

IEEE Transactions on Pattern Analysis and Machine Intelligence
Content-based video indexing of TV broadcast news using hidden Markov models

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
A probabilistic multimedia retrieval model and its evaluation

EURASIP Journal on Applied Signal Processing
Fast sequential decoding algorithm using a stack

IBM Journal of Research and Development
Audio-visual speech modeling for continuous speech recognition

IEEE Transactions on Multimedia

Hierarchical group process representation in multi-agent activity recognition

Image Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper summarizes some of the current research challenges arising from multi-channel sequence processing. Indeed, multiple real life applications involve simultaneous recording and analysis of multiple information sources, which may be asynchronous, have different frame rates, exhibit different stationarity properties, and carry complementary (or correlated) information. Some of these problems can already be tackled by one of the many statistical approaches towards sequence modeling. However, several challenging research issues are still open, such as taking into account asynchrony and correlation between several feature streams, or handling the underlying growing complexity. In this framework, we discuss here two novel approaches, which recently started to be investigated with success in the context of large multimodal problems. These include the asynchronous HMM, providing a principled approach towards the processing of multiple feature streams, and the layered HMM approach, providing a good formalism for decomposing large and complex (multi-stream) problems into layered architectures. As briefly reported here, combination of these two approaches yielded successful results on several multi-channel tasks, ranging from audio-visual speech recognition to automatic meeting analysis.