Multi-document summarization of scientific corpora

Authors:
Ozge Yeloglu;Evangelos Milios;Nur Zincir-Heywood
Affiliations:
Dalhousie University, Halifax, Canada;Dalhousie University, Halifax, Canada;Dalhousie University, Halifax, Canada
Venue:
Proceedings of the 2011 ACM Symposium on Applied Computing
Year:
2011

Citing 16
Cited 1

A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Generating summaries of multiple news articles

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Towards multidocument summarization by reformulation: progress and prospects

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
New Methods in Automatic Extracting

Journal of the ACM (JACM)
Summarizing scientific articles: experiments with relevance and rhetorical status

Computational Linguistics - Summarization
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Information fusion in the context of multi-document summarization

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
World wide web site summarization

Web Intelligence and Agent Systems
Automated text summarization and the SUMMARIST system

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Design and development of a concept-based multi-document summarization system for research abstracts

Journal of Information Science
An Efficient Statistical Approach for Automatic Organic Chemistry Summarization

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Columbia Newsblaster: multilingual news summarization on the web

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
The automatic creation of literature abstracts

IBM Journal of Research and Development
Citation summarization through keyphrase extraction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Machine translation for multilingual summary content evaluation

Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigated four approaches for scientific corpora summarization when only gold-standard keyterms available. MEAD with built-in default vocabulary, MEAD with corpus specific vocabulary extracted by Keyphrase Extraction Algorithm (KEA), LexRank (a state-of-the-art summarization algorithm based on random walk) and W3SS (summarization algorithm based on keyword density) are tested on two Computer Science research paper collections. We use a content evaluation method, pyramid method, instead of the well-known ROUGE metrics since there are no gold-standard summaries available for our data. Evaluations with pyramid method indicates that including a corpus specific vocabulary to the traditional summarization methods improves the performance but not significantly. On the other hand, visual inspection shows us that current content evaluation methods, which use only the gold-standard keyterm information, are not intuitive and focus must turn into better evaluation techniques especially for the multi-document summarization problem. Even though the pyramid method looks for important keyterms in the resulting summaries, it cannot distinguish between a general introductory sentence about the area and a specific sentence on the core idea, if they both contain the same keyterm. Also, our results show that the state of the art summarization method LexRank is not feasible for scientific corpus summarization because of its high computational cost.