Making large-scale support vector machine learning practical
Advances in kernel methods
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
ParaEval: using paraphrases to evaluate summaries automatically
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Modeling local coherence: An entity-based approach
Computational Linguistics
Mind the gap: dangers of divorcing evaluations of summary content from linguistic quality
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automatic evaluation of linguistic quality in multi-document summarization
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
TESLA: translation evaluation of sentences with linear-programming-based analysis
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Automatically evaluating text coherence using discourse relations
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Hi-index | 0.00 |
An ideal summarization system should produce summaries that have high content coverage and linguistic quality. Many state-of-the-art summarization systems focus on content coverage by extracting content-dense sentences from source articles. A current research focus is to process these sentences so that they read fluently as a whole. The current AESOP task encourages research on evaluating summaries on content, readability, and overall responsiveness. In this work, we adapt a machine translation metric to measure content coverage, apply an enhanced discourse coherence model to evaluate summary readability, and combine both in a trained regression model to evaluate overall responsiveness. The results show significantly improved performance over AESOP 2011 submitted metrics.