The diversity-based approach to open-domain text summarization

Authors:
Tadashi Nomoto;Yuji Matsumoto
Affiliations:
National Institute of Japanese Literature, 1-16-10 Yutaka Shinagawa, Tokyo 142-8585, Japan;Nara Institute of Science and Technology, 8916-5 Takayama Ikoma, Nara 630-0129, Japan
Venue:
Information Processing and Management: an International Journal
Year:
2003

Citing 18
Cited 6

C4.5: programs for machine learning

C4.5: programs for machine learning
A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Stochastic complexity in learning

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The automatic construction of large-scale corpora for summarization research

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Generic text summarization using relevance measure and latent semantic analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A new approach to unsupervised text summarization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating Natural Language Processing Systems: An Analysis and Review

Evaluating Natural Language Processing Systems: An Analysis and Review
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Comparing the Minimum Description Length Principle and Boosting in the Automatic Analysis of Discourse

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Distribution of content words and phrases in text and language modelling

Natural Language Engineering
The rhetorical parsing of natural language texts

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Fast generation of abstracts from general domain text corpora by extracting relevant sentences

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Query-relevant summarization using FAQs

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization - Volume 4

Supervised ranking in open-domain text summarization

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Summary of FAQs from a topical forum based on the native composition structure

Expert Systems with Applications: An International Journal
pSum-SaDE: a modified p-median problem and self-adaptive differential evolution algorithm for text summarization

Applied Computational Intelligence and Soft Computing
Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling

Information Processing and Management: an International Journal
MCMR: Maximum coverage and minimum redundant text summarization model

Expert Systems with Applications: An International Journal
DESAMC+DocSum: Differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper introduces a novel approach to unsupervised text summarization, which in principle should work for any domain or genre. The novelty lies in exploiting the diversity of concepts in text for summarization, which has not received much attention in the summarization literature. We propose, in addition, what we call the information-centric approach to evaluation, where the quality of summaries is judged not in terms of how well they match human-created summaries but in terms of how well they represent their source documents in IR tasks such document retrieval and text categorization. To find the effectiveness of our approach under the proposed evaluation scheme, we set out to examine how a system with the diversity functionality performs against one without, using the test data known as BMIR-J2. The results demonstrate a clear superiority of the diversity-based approach to a non-diversity-based approach.The paper also addresses the question of how closely the diversity approach models human judgments on summarization. We have created a relatively large volume of data annotated for relevance to summarization by human subjects. We have trained a decision tree-based summarizer using the data, and examined how the diversity method compares with the supervised method in performance when tested on the data. It was found that the diversity approach performs as well as and in some cases superior to the supervised method.