WordNet: a lexical database for English
Communications of the ACM
Summarizing archived discussions: a beginning
Proceedings of the 8th international conference on Intelligent user interfaces
Near-synonymy and lexical choice
Computational Linguistics
Maximum entropy models for natural language ambiguity resolution
Maximum entropy models for natural language ambiguity resolution
Natural Language Engineering
Combining linguistic and machine learning techniques for email summarization
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Generating overview summaries of ongoing email thread discussions
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Detection of question-answer pairs in email conversations
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Learning to detect conversation focus of threaded discussions
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Opinion Mining and Sentiment Analysis
Foundations and Trends in Information Retrieval
Using Question-Answer Pairs in Extractive Summarization of Email Conversations
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Adaptive Maximum Marginal Relevance Based Multi-email Summarization
AICI '09 Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence
Proceedings of the 2009 Workshop on Language Generation and Summarisation
UCNLG+Sum '09 Proceedings of the 2009 Workshop on Language Generation and Summarisation
Hi-index | 0.00 |
Human-written, good quality extractive summaries pay great attention to the text intermixing the extracts. In this work, we focused on the lexical choice for verbs introducing quoted text. We analyzed 4000+ high quality summaries for a high traffic mailing list and manually assembled 39 quotation-introducing verb classes that cover the majority of the verb occurrences. A significant amount of the data is covered by on-going work on e-mail "speech acts." However, we found that one third of the "tail" is composed by "risky" verbs that most likely will be beyond the state of the art for longer time. We used this fact to highlight the trade-offs of risk taking in NLG, where interesting prose might come at the cost of unsettling some of the readers.