Utilizing gestures to improve sentence boundary detection

Authors:
Lei Chen;Mary P. Harper
Affiliations:
School of Electrical and Computer Engineering, Purdue University, West Lafayette, USA 47905 and Educational Testing Service (ETS), Princeton, USA 08541;Department of Computer Science, University of Maryland, College Park, USA 20742 and Human Language Technology Center of Excellence, Johns Hopkins University, Baltimore, USA 21211
Venue:
Multimedia Tools and Applications
Year:
2011

Citing 18
Cited 1

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
A maximum entropy approach to natural language processing

Computational Linguistics
Prosody-based automatic segmentation of speech into sentences and topics

Speech Communication - Special issue on accessing information in spoken audio
Multimodal human discourse: gesture and speech

ACM Transactions on Computer-Human Interaction (TOCHI)
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A probabilistic approach to reference resolution in multimodal user interfaces

Proceedings of the 9th international conference on Intelligent user interfaces
Experiments on sentence boundary detection

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Multimodal model integration for sentence unit detection

Proceedings of the 6th international conference on Multimodal interfaces
MacVisSTA: a system for multimodal analysis

Proceedings of the 6th international conference on Multimodal interfaces
Structural event detection for rich transcription of speech

Structural event detection for rich transcription of speech
Meeting room configuration and multiple camera calibration in meeting analysis

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Using maximum entropy (ME) model to incorporate gesture cues for SU detection

Proceedings of the 8th international conference on Multimodal interfaces
Salience modeling based on non-verbal modalities for spoken language understanding

Proceedings of the 8th international conference on Multimodal interfaces
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Gesture improves coreference resolution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
VACE multimodal meeting corpus

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction

Improving pronominal and deictic co-reference resolution with multi-modal features

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

An accurate estimation of sentence units (SUs) in spontaneous speech is important for (1) helping listeners to better understand speech content and (2) supporting other natural language processing tasks that require sentence information. There has been much research on automatic SU detection; however, most previous studies have only used lexical and prosodic cues, but have not used nonverbal cues, e.g., gesture. Gestures play an important role in human conversations, including providing semantic content, expressing emotional status, and regulating conversational structure. Given the close relationship between gestures and speech, gestures may provide additional contributions to automatic SU detection. In this paper, we have investigated the use of gesture cues for enhancing the SU detection. Particularly, we have focused on: (1) collecting multimodal data resources involving gestures and SU events in human conversations, (2) analyzing the collected data sets to enrich our knowledge about co-occurrence of gestures and SUs, and (3) building statistical models for detecting SUs using speech and gestural cues. Our data analyses suggest that some gesture patterns influence a word boundary's probability of being an SU. On the basis of the data analyses, a set of novel gestural features were proposed for SU detection. A combination of speech and gestural features was found to provide more accurate SU predictions than using only speech features in discriminative models. Findings in this paper support the view that human conversations are processes involving multimodal cues, and so they are more effectively modeled using information from both verbal and nonverbal channels.