Temporal corpus summarization using submodular word coverage

Authors:
Ruben Sipos;Adith Swaminathan;Pannaga Shivaswamy;Thorsten Joachims
Affiliations:
Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA;Cornell University, Ithaca, NY, USA
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 24
Cited 1

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The budgeted maximum coverage problem

Information Processing Letters
Automatic generation of overview timelines

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Temporal summaries of new topics

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Term Weighting Approaches in Automatic Text Retrieval

Term Weighting Approaches in Automatic Text Retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Enhancing digital libraries with TechLens+

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Event threading within news topics

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Information genealogy: uncovering the flow of ideas in non-hyperlinked document databases

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting diverse subsets using structural SVMs

Proceedings of the 25th international conference on Machine learning
TSCAN: a novel method for topic summarization and content anatomy

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Event-Based Summarization Using Time Features

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Turning down the noise in the blogosphere

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Text summarization model based on maximum coverage problem and its variant

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
The ACL Anthology Network corpus

NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
A study of global inference algorithms in multi-document summarization

ECIR'07 Proceedings of the 29th European conference on IR research
Multi-document summarization via budgeted maximization of submodular functions

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
From bursty patterns to bursty facts: The effectiveness of temporal text mining for news

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Weighted citation: An indicator of an article's prestige

Journal of the American Society for Information Science and Technology
Evolutionary timeline summarization: a balanced optimization framework via iterative substitution

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Beyond keyword search: discovering relevant scientific literature

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Structured learning of two-level dynamic rankings

Proceedings of the 20th ACM international conference on Information and knowledge management
Sentence extraction using time features in multi-document summarization

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Large-margin learning of submodular summarization models

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

Self reinforcement for important passage retrieval

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many areas of life, we now have almost complete electronic archives reaching back for well over two decades. This includes, for example, the body of research papers in computer science, all news articles written in the US, and most people's personal email. However, we have only rather limited methods for analyzing and understanding these collections. While keyword-based retrieval systems allow efficient access to individual documents in archives, we still lack methods for understanding a corpus as a whole. In this paper, we explore methods that provide a temporal summary of such corpora in terms of landmark documents, authors, and topics. In particular, we explicitly model the temporal nature of influence between documents and re-interpret summarization as a coverage problem over words anchored in time. The resulting models provide monotone sub-modular objectives for computing informative and non-redundant summaries over time, which can be efficiently optimized with greedy algorithms. Our empirical study shows the effectiveness of our approach over several baselines.