Learning information status of discourse entities

Authors:
Malvina Nissim
Affiliations:
National Research Council (ISTC-CNR), Roma, Italy
Venue:
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Year:
2006

Citing 14
Cited 7

Centering: a framework for modeling the local coherence of discourse

Computational Linguistics
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
The syntactic process

The syntactic process
Discourse and Information Structure

Journal of Logic, Language and Information
An empirically based system for processing definite descriptions

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Bridging

TINLAP '75 Proceedings of the 1975 workshop on Theoretical issues in natural language processing
Never look back: an alternative to centering

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A prosodic analysis of discourse segments in direction-giving monologues

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
High-precision identification of discourse new and unique noun phrases

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Data-driven approaches for information structure identification

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue

Language Resources and Evaluation
Disentangling chat with local coherence models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Information status distinctions and referring expressions: An empirical study of references to people in news summaries

Computational Linguistics
Learning the information status of noun phrases in spoken dialogues

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning the fine-grained information status of discourse entities

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Collective classification for fine-grained information status

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Automatically acquiring fine-grained information status distinctions in German

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address the issue of automatically assigning information status to discourse entities. Using an annotated corpus of conversational English and exploiting morpho-syntactic and lexical features, we train a decision tree to classify entities introduced by noun phrases as old, mediated, or new. We compare its performance with hand-crafted rules that are mainly based on morpho-syntactic features and closely relate to the guidelines that had been used for the manual annotation. The decision tree model achieves an overall accuracy of 79.5%, significantly outperforming the hand-crafted algorithm (64.4%). We also experiment with binary classifications by collapsing in turn two of the three target classes into one and retraining the model. The highest accuracy achieved on binary classification is 93.1%.