Automatic identification of discourse markers in dialogues: An in-depth study of like and well

Authors:
Andrei Popescu-Belis;Sandrine Zufferey
Affiliations:
Idiap Research Institute, PO Box 592, 1920 Martigny, Switzerland;Department of Linguistics, University of Geneva, 1211 Geneva 4, Switzerland
Venue:
Computer Speech and Language
Year:
2011

Citing 19
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Emergent linguistic rules from inducing decision trees: disambiguating discourse clue words

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Speech repairs, intonational boundaries and discourse markers: modeling speakers' utterances in spoken dialog

Speech repairs, intonational boundaries and discourse markers: modeling speakers' utterances in spoken dialog
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Programming for Corpus Linguistics

Programming for Corpus Linguistics
The Theory and Practice of Discourse Parsing and Summarization

The Theory and Practice of Discourse Parsing and Summarization
Automatic summarization of open-domain multiparty dialogues in diverse genres

Computational Linguistics - Summarization
Discourse learning: an investigation of dialogue act tagging using transformation-based learning

Discourse learning: an investigation of dialogue act tagging using transformation-based learning
Empirical studies on the disambiguation of cue phrases

Computational Linguistics
Discourse segmentation by human and automated means

Computational Linguistics
A prosodic analysis of discourse segments in direction-giving monologues

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
A syntactic framework for speech repairs and other disruptions

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Inter-coder agreement for computational linguistics

Computational Linguistics
Cueing the virtual storyteller: analysis of cue phrase usage in fairy tales

ENLG '07 Proceedings of the Eleventh European Workshop on Natural Language Generation
A lexically-driven algorithm for disfluency detection

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Early deletion of fillers in processing conversational speech

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Cue phrase classification using machine learning

Journal of Artificial Intelligence Research
Towards a multidimensional semantics of discourse markers in spoken dialogue

IWCS-8 '09 Proceedings of the Eighth International Conference on Computational Semantics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: The lexical items like and well can serve as discourse markers (DMs), but can also play numerous other roles, such as verb or adverb. Identifying the occurrences that function as DMs is an important step for language understanding by computers. In this study, automatic classifiers using lexical, prosodic/positional and sociolinguistic features are trained over transcribed dialogues, manually annotated with DM information. The resulting classifiers improve state-of-the-art performance of DM identification, at about 90% recall and 79% precision for like (84.5% accuracy, @k=0.69), and 99% recall and 98% precision for well (97.5% accuracy, @k=0.88). Automatic feature analysis shows that lexical collocations are the most reliable indicators, followed by prosodic/positional features, while sociolinguistic features are marginally useful for the identification of DM like and not useful for well. The differentiated processing of each type of DM improves classification accuracy, suggesting that these types should be treated individually.