EUSUM: extracting easy-to-understand english summaries for non-native readers

Authors:
Xiaojun Wan;Huiying Li;Jianguo Xiao
Affiliations:
Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China
Venue:
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Year:
2010

Citing 21
Cited 4

The nature of statistical learning theory

The nature of statistical learning theory
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The use of unlabeled data to improve supervised learning for text summarization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The automated acquisition of topic signatures for text summarization

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Centroid-based summarization of multiple documents

Information Processing and Management: an International Journal
From single to multi-document summarization: a prototype system and its evaluation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Topic themes for multi-document summarization

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Predicting reading difficulty with statistical language models

Journal of the American Society for Information Science and Technology
Reading level assessment using support vector machines and statistical language models

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A bottom-up approach to sentence ordering for multi-document summarization

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Using Cross-Document Random Walks for Topic-Focused Multi-Document

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Multi-document summarization using cluster-based link analysis

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Predicting the readability of short web summaries

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Extractive summarization using supervised and semi-supervised learning

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Revisiting readability: a unified framework for predicting text quality

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Inferring strategies for sentence ordering in multidocument news summarization

Journal of Artificial Intelligence Research
An analysis of statistical models and features for reading difficulty prediction

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
The automatic creation of literature abstracts

IBM Journal of Research and Development
Statistical estimation of word acquisition with application to readability prediction

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Cross-language document summarization based on machine translation quality prediction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Revisiting centrality-as-relevance: support sets and similarity as geometric proximity

Journal of Artificial Intelligence Research
Self reinforcement for important passage retrieval

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
The notion of diversity in graphical entity summarisation on semantic knowledge graphs

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we investigate a novel and important problem in multi-document summarization, i.e., how to extract an easy-to-understand English summary for non-native readers. Existing summarization systems extract the same kind of English summaries from English news documents for both native and non-native readers. However, the non-native readers have different English reading skills because they have different English education and learning backgrounds. An English summary which can be easily understood by native readers may be hardly understood by non-native readers. We propose to add the dimension of reading easiness or difficulty to multi-document summarization, and the proposed EUSUM system can produce easy-to-understand summaries according to the English reading skills of the readers. The sentence-level reading easiness (or difficulty) is predicted by using the SVM regression method. And the reading easiness score of each sentence is then incorporated into the summarization process. Empirical evaluation and user study have been performed and the results demonstrate that the EUSUM system can produce more easy-to-understand summaries for non-native readers than existing summarization systems, with very little sacrifice of the summary's informativeness.