Extractive email thread summarization: can we do better than he said she said?

Authors:
Pablo Ariel Duboue
Affiliations:
du College, Montreal, Quebec
Venue:
INLG '12 Proceedings of the Seventh International Natural Language Generation Conference
Year:
2012

Citing 14
Cited 0

WordNet: a lexical database for English

Communications of the ACM
Summarizing archived discussions: a beginning

Proceedings of the 8th international conference on Intelligent user interfaces
Near-synonymy and lexical choice

Computational Linguistics
Maximum entropy models for natural language ambiguity resolution

Maximum entropy models for natural language ambiguity resolution
UIMA: an architectural approach to unstructured information processing in the corporate research environment

Natural Language Engineering
Combining linguistic and machine learning techniques for email summarization

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Generating overview summaries of ongoing email thread discussions

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Detection of question-answer pairs in email conversations

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Learning to detect conversation focus of threaded discussions

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Opinion Mining and Sentiment Analysis

Foundations and Trends in Information Retrieval
Using Question-Answer Pairs in Extractive Summarization of Email Conversations

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Summarizing email threads

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Adaptive Maximum Marginal Relevance Based Multi-email Summarization

AICI '09 Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence
Proceedings of the 2009 Workshop on Language Generation and Summarisation

UCNLG+Sum '09 Proceedings of the 2009 Workshop on Language Generation and Summarisation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Human-written, good quality extractive summaries pay great attention to the text intermixing the extracts. In this work, we focused on the lexical choice for verbs introducing quoted text. We analyzed 4000+ high quality summaries for a high traffic mailing list and manually assembled 39 quotation-introducing verb classes that cover the majority of the verb occurrences. A significant amount of the data is covered by on-going work on e-mail "speech acts." However, we found that one third of the "tail" is composed by "risky" verbs that most likely will be beyond the state of the art for longer time. We used this fact to highlight the trade-offs of risk taking in NLG, where interesting prose might come at the cost of unsettling some of the readers.