A zipf-like distant supervision approach for multi-document summarization using wikinews articles

Authors:
Felipe Bravo-Marquez;Manuel Manriquez
Affiliations:
Department of Computer Science, University of Chile, Chile;University of Santiago of Chile, Chile
Venue:
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Year:
2012

Citing 19
Cited 0

Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Advances in Automatic Text Summarization

Advances in Automatic Text Summarization
Generating Text Summaries through the Relative Importance of Topics

IBERAMIA-SBIA '00 Proceedings of the International Joint Conference, 7th Ibero-American Conference on AI: Advances in Artificial Intelligence
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Feature selection for ranking

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Learning query-biased web page summarization

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Multi-document summarization using cluster-based link analysis

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Correlation between ROUGE and human evaluation of extractive meeting summaries

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
A SVM-Based Ensemble Approach to Multi-Document Summarization

Canadian AI '09 Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence
Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

NAACL-ANLP-AutoSum '00 Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
The automatic creation of literature abstracts

IBM Journal of Research and Development
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Boilerplate detection using shallow text features

Proceedings of the third ACM international conference on Web search and data mining
Many are better than one: improving multi-document summarization via weighted consensus

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A new approach to improving multilingual summarization using a genetic algorithm

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Hypergeometric language model and zipf-like scoring function for web document similarity retrieval

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work presents a sentence ranking strategy based on distant supervision for the multi-document summarization problem. Due to the difficulty of obtaining large training datasets formed by document clusters and their respective human-made summaries, we propose building a training and a testing corpus from Wikinews. Wikinews articles are modeled as "distant" summaries of their cited sources, considering that first sentences of Wikinews articles tend to summarize the event covered in the news story. Sentences from cited sources are represented as tuples of numerical features and labeled according to a relationship with the given distant summary that is based on the Zipf law. Ranking functions are trained using linear regressions and ranking SVMs, which are also combined using Borda count. Top ranked sentences are concatenated and used to build summaries, which are compared with the first sentences of the distant summary using ROUGE evaluation measures. Experimental results obtained show the effectiveness of the proposed method and that the combination of different ranking techniques outperforms the quality of the generated summary.