C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
A maximum entropy approach to natural language processing
Computational Linguistics
Prosody-based automatic segmentation of speech into sentences and topics
Speech Communication - Special issue on accessing information in spoken audio
Multimodal human discourse: gesture and speech
ACM Transactions on Computer-Human Interaction (TOCHI)
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A probabilistic approach to reference resolution in multimodal user interfaces
Proceedings of the 9th international conference on Intelligent user interfaces
Experiments on sentence boundary detection
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Multimodal model integration for sentence unit detection
Proceedings of the 6th international conference on Multimodal interfaces
MacVisSTA: a system for multimodal analysis
Proceedings of the 6th international conference on Multimodal interfaces
Structural event detection for rich transcription of speech
Structural event detection for rich transcription of speech
Meeting room configuration and multiple camera calibration in meeting analysis
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Using maximum entropy (ME) model to incorporate gesture cues for SU detection
Proceedings of the 8th international conference on Multimodal interfaces
Salience modeling based on non-verbal modalities for spoken language understanding
Proceedings of the 8th international conference on Multimodal interfaces
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Gesture improves coreference resolution
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
VACE multimodal meeting corpus
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Improving pronominal and deictic co-reference resolution with multi-modal features
SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Hi-index | 0.00 |
An accurate estimation of sentence units (SUs) in spontaneous speech is important for (1) helping listeners to better understand speech content and (2) supporting other natural language processing tasks that require sentence information. There has been much research on automatic SU detection; however, most previous studies have only used lexical and prosodic cues, but have not used nonverbal cues, e.g., gesture. Gestures play an important role in human conversations, including providing semantic content, expressing emotional status, and regulating conversational structure. Given the close relationship between gestures and speech, gestures may provide additional contributions to automatic SU detection. In this paper, we have investigated the use of gesture cues for enhancing the SU detection. Particularly, we have focused on: (1) collecting multimodal data resources involving gestures and SU events in human conversations, (2) analyzing the collected data sets to enrich our knowledge about co-occurrence of gestures and SUs, and (3) building statistical models for detecting SUs using speech and gestural cues. Our data analyses suggest that some gesture patterns influence a word boundary's probability of being an SU. On the basis of the data analyses, a set of novel gestural features were proposed for SU detection. A combination of speech and gestural features was found to provide more accurate SU predictions than using only speech features in discriminative models. Findings in this paper support the view that human conversations are processes involving multimodal cues, and so they are more effectively modeled using information from both verbal and nonverbal channels.