Speech repairs, intonational phrases, and discourse markers: modeling speakers' utterances in spoken dialogue

Authors:
Peter A. Heeman;James F. Allen
Affiliations:
Oregon Graduate Institute;University of Rochester
Venue:
Computational Linguistics
Year:
1999

Citing 17
Cited 27

Class-based n-gram models of natural language

Computational Linguistics
Natural language parsing as statistical pattern recognition

Natural language parsing as statistical pattern recognition
Speech repairs, intonational boundaries and discourse markers: modeling speakers' utterances in spoken dialog

Speech repairs, intonational boundaries and discourse markers: modeling speakers' utterances in spoken dialog
Utterance Units in Spoken Dialogue

ECAI '96 Workshop on Dialogue Processing in Spoken Language Systems
Recognition of Conversational Telephone Speech using the Janus Speech Engine

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Empirical studies on the disambiguation of cue phrases

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Deterministic parsing of syntactic non-fluencies

ACL '83 Proceedings of the 21st annual meeting on Association for Computational Linguistics
Gemini: a natural language system for spoken-language understanding

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Detecting and correcting speech repairs

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Prosody, syntax and parsing

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Integrating multiple knowledge sources for detection and correction of repairs in human-computer dialog

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
The N-Best algorithm: an efficient procedure for finding top N sentence hypotheses

HLT '89 Proceedings of the workshop on Speech and Natural Language
Towards history-based grammars: using richer models for probabilistic parsing

HLT '91 Proceedings of the workshop on Speech and Natural Language
Statistical language modeling for speech disfluencies

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Cue phrase classification using machine learning

Journal of Artificial Intelligence Research

Automatic summarization of open-domain multiparty dialogues in diverse genres

Computational Linguistics - Summarization
Combining Syntax and Pragmatic Knowledge for the Understanding of Spontaneous Spoken Sentences

LACL '01 Proceedings of the 4th International Conference on Logical Aspects of Computational Linguistics
Gesture Patterns during Speech Repairs

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Robust garden path parsing

Natural Language Engineering
Modelling speech repairs in German and Mandarin Chinese spoken dialogues

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Edit detection and parsing for transcribed speech

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Word fragment identification using acoustic-prosodic features in conversational speech

NAACLstudent '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proceedings of the HLT-NAACL 2003 student research workshop - Volume 3
Toward a large spontaneous Mandarin dialogue corpus

SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
Visualizing spoken discourse: prosodic form and discourse functions of interruptions

SIGDIAL '01 Proceedings of the Second SIGdial Workshop on Discourse and Dialogue - Volume 16
DialogueView: an annotation tool for dialogue

SIGDIAL '02 Proceedings of the 3rd SIGdial workshop on Discourse and dialogue - Volume 2
A TAG-based noisy channel model of speech repairs

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
PCFGs with syntactic and prosodic indicators of speech repairs

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effective use of prosody in parsing conversational speech

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Spoken language classification using hybrid classifier combination

International Journal of Hybrid Intelligent Systems
Dialogueview: Annotating dialogues in multiple views with abstraction†

Natural Language Engineering
Ad Hoc Data and the Token Ambiguity Problem

PADL '09 Proceedings of the 11th International Symposium on Practical Aspects of Declarative Languages
A lexically-driven algorithm for disfluency detection

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Using integer linear programming for detecting speech disfluencies

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Improved features and models for detecting edit disfluencies in transcribing spontaneous Mandarin speech

IEEE Transactions on Audio, Speech, and Language Processing
Word buffering models for improved speech repair parsing

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Interruption Point Detection of Spontaneous Speech Using Inter-Syllable Boundary-Based Prosodic Features

ACM Transactions on Asian Language Information Processing (TALIP)
Cross-domain speech disfluency detection

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Autism and interactional aspects of dialogue

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Detecting structural events for assessing non-native speech

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Contextual maximum entropy model for edit disfluency detection of spontaneous speech

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Classroom lecture recognition

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
On the semantics and pragmatics of dysfluency

AC'11 Proceedings of the 18th Amsterdam colloquim conference on Logic, Language and Meaning

Quantified Score

Hi-index	0.02

Visualization

Abstract

Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker's intended utterances: both segmenting a speaker's turn into utterances and determining the intended words in each utterance. Even assuming perfect word recognition, the latter problem is complicated by the occurrence of speech repairs, which occur where speakers go back and change (or repeat) something they just said. The words that are replaced or repeated are no longer part of the intended utterance, and so need to be identified. Segmenting turns and resolving repairs are strongly interwined with a third task: identifying discourse markers. Because of the interactions, and interactions with POS tagging and speech recognition, we need to address these tasks together and early on in the processing stream. This paper presents a statistical language model in which we redefine the speech recognition problem so that it includes the identification of POS tags, discourse markers, speech repairs, and intonational phrases. By solving these simultaneously, we obtain better results on each task than addressing them separately. Our model is able to identify 72% of turn-internal intonational boundaries with a precision of 71%, 97% of discourse markers with 96% precision, and detect and correct 66% of repairs with 74% precision.