Automatically learning cognitive status for multi-document summarization of newswire

Authors:
Ani Nenkova;Advaith Siddharthan;Kathleen McKeown
Affiliations:
Columbia University;Columbia University;Columbia University
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 13
Cited 4

Attention, intentions, and the structure of discourse

Computational Linguistics
Constructing literature abstracts by computer: techniques and prospects

Information Processing and Management: an International Journal - Special issue on natural language processing and information retrieval
Centering: a framework for modeling the local coherence of discourse

Computational Linguistics
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
Statistics-Based Summarization - Step One: Sentence Compression

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Information fusion for multidocument summarization: paraphrasing and generation

Information fusion for multidocument summarization: paraphrasing and generation
A corpus-based investigation of definite description use

Computational Linguistics
Generating natural language summaries from multiple on-line sources

Computational Linguistics - Special issue on natural language generation
References to named entities: a corpus study

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Syntactic simplification for improving content selection in multi-document summarization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Extending the entity grid with entity-specific features

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Information status distinctions and referring expressions: An empirical study of references to people in news summaries

Computational Linguistics
Text summarisation in progress: a literature review

Artificial Intelligence Review
Automatically acquiring fine-grained information status distinctions in German

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine summaries can be improved by using knowledge about the cognitive status of news article referents. In this paper, we present an approach to automatically acquiring distinctions in cognitive status using machine learning over the forms of referring expressions appearing in the input. We focus on modeling references to people, both because news often revolve around people and because existing natural language tools for named entity identification are reliable. We examine two specific distinctions---whether a person in the news can be assumed to be known to a target audience (hearer-old vs hearer-new) and whether a person is a major character in the news story. We report on machine learning experiments that show that these distinctions can be learned with high accuracy, and validate our approach using human subjects.