Sumblr: continuous summarization of evolving tweet streams

Authors:
Lidan Shou;Zhenhua Wang;Ke Chen;Gang Chen
Affiliations:
Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China;Zhejiang University, Hangzhou, China
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 22
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Generic text summarization using relevance measure and latent semantic analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Centroid-based summarization of multiple documents

Information Processing and Management: an International Journal
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
2005 Special Issue: Efficient streaming text clustering

Neural Networks - 2005 Special issue: IJCNN 2005
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Multi-document summarization using cluster-based link analysis

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Multi-document summarization by maximizing informative content-words

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
On clustering massive text and categorical data streams

Knowledge and Information Systems
Summarizing microblogs automatically

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Text stream clustering algorithm based on adaptive feature selection

Expert Systems with Applications: An International Journal
Twitinfo: aggregating and visualizing microblogs for event exploration

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
TI: an efficient indexing mechanism for real-time search on tweets

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Summarizing a document stream

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Evolutionary timeline summarization: a balanced optimization framework via iterative substitution

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Discovering geographical topics in the twitter stream

Proceedings of the 21st international conference on World Wide Web
A framework for summarizing and analyzing twitter feeds

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Generating event storylines from microblogs

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the explosive growth of microblogging services, short-text messages (also known as tweets) are being created and shared at an unprecedented rate. Tweets in its raw form can be incredibly informative, but also overwhelming. For both end-users and data analysts it is a nightmare to plow through millions of tweets which contain enormous noises and redundancies. In this paper, we study continuous tweet summarization as a solution to address this problem. While traditional document summarization methods focus on static and small-scale data, we aim to deal with dynamic, quickly arriving, and large-scale tweet streams. We propose a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams. We first propose an online tweet stream clustering algorithm to cluster tweets and maintain distilled statistics called Tweet Cluster Vectors. Then we develop a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations. Finally, we describe a topic evolvement detection method, which consumes online and historical summaries to produce timelines automatically from tweet streams. Our experiments on large-scale real tweets demonstrate the efficiency and effectiveness of our approach.