Multimodal model integration for sentence unit detection

Authors:
Mary P. Harper;Elizabeth Shriberg
Affiliations:
Purdue University, West Lafayette, IN;SRI International, Menlo Park, CA
Venue:
Proceedings of the 6th international conference on Multimodal interfaces
Year:
2004

Citing 13
Cited 10

Class-based n-gram models of natural language

Computational Linguistics
Prosody-based automatic segmentation of speech into sentences and topics

Speech Communication - Special issue on accessing information in spoken audio
Multimodal human discourse: gesture and speech

ACM Transactions on Computer-Human Interaction (TOCHI)
Recovering the Temporal Structure of Natural Gesture

FG '96 Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG '96)
Exploiting Speech/Gesture Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration

FG '00 Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition 2000
A Parallel Algorithm for Dynamic Gesture Tracking

RATFG-RTS '99 Proceedings of the International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems
Gesture Cues for Conversational Interaction in Monocular Video

RATFG-RTS '99 Proceedings of the International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems
Gesture Patterns during Speech Repairs

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Prosody Based Co-analysis for Continuous Recognition of Coverbal Gestures

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
A probabilistic approach to reference resolution in multimodal user interfaces

Proceedings of the 9th international conference on Intelligent user interfaces
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Statistical language modeling for speech disfluencies

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Glove-talk II - a neural-network interface which maps gestures to parallel formant speech synthesizer controls

IEEE Transactions on Neural Networks

Utilizing gestures to better understand dynamic structure of human communication

Proceedings of the 6th international conference on Multimodal interfaces
Using maximum entropy (ME) model to incorporate gesture cues for SU detection

Proceedings of the 8th international conference on Multimodal interfaces
Incorporating gesture and gaze into multimodal models of human-to-human communication

NAACL-DocConsortium '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: doctoral consortium
Semantic back-pointers from gesture

NAACL-DocConsortium '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: doctoral consortium
Gesture improves coreference resolution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Gesture salience as a hidden variable for coreference resolution and keyframe extraction

Journal of Artificial Intelligence Research
Recognizing child's emotional state in problem-solving child-machine interactions

Proceedings of the 2nd Workshop on Child, Computer and Interaction
The recognition and comprehension of hand gestures: a review and research agenda

ZiF'06 Proceedings of the Embodied communication in humans and machines, 2nd ZiF research group international conference on Modeling communication with robots and virtual humans
Utilizing gestures to improve sentence boundary detection

Multimedia Tools and Applications
VACE multimodal meeting corpus

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we adopt a direct modeling approach to utilize conversational gesture cues in detecting sentence boundaries, called SUs, in video taped conversations. We treat the detection of SUs as a classification task such that for each inter-word boundary, the classifier decides whether there is an SU boundary or not. In addition to gesture cues, we also utilize prosody and lexical knowledge sources. In this first investigation, we find that gesture features complement the prosodic and lexical knowledge sources for this task. By using all of the knowledge sources, the model is able to achieve the lowest overall SU detection error rate.