BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Generic text summarization using relevance measure and latent semantic analysis
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Centroid-based summarization of multiple documents
Information Processing and Management: an International Journal
Automatic evaluation of summaries using N-gram co-occurrence statistics
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
2005 Special Issue: Efficient streaming text clustering
Neural Networks - 2005 Special issue: IJCNN 2005
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Multi-document summarization using cluster-based link analysis
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
LexRank: graph-based lexical centrality as salience in text summarization
Journal of Artificial Intelligence Research
Multi-document summarization by maximizing informative content-words
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
On clustering massive text and categorical data streams
Knowledge and Information Systems
Summarizing microblogs automatically
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Text stream clustering algorithm based on adaptive feature selection
Expert Systems with Applications: An International Journal
Twitinfo: aggregating and visualizing microblogs for event exploration
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
TI: an efficient indexing mechanism for real-time search on tweets
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Evolutionary timeline summarization: a balanced optimization framework via iterative substitution
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Discovering geographical topics in the twitter stream
Proceedings of the 21st international conference on World Wide Web
A framework for summarizing and analyzing twitter feeds
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Generating event storylines from microblogs
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
With the explosive growth of microblogging services, short-text messages (also known as tweets) are being created and shared at an unprecedented rate. Tweets in its raw form can be incredibly informative, but also overwhelming. For both end-users and data analysts it is a nightmare to plow through millions of tweets which contain enormous noises and redundancies. In this paper, we study continuous tweet summarization as a solution to address this problem. While traditional document summarization methods focus on static and small-scale data, we aim to deal with dynamic, quickly arriving, and large-scale tweet streams. We propose a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams. We first propose an online tweet stream clustering algorithm to cluster tweets and maintain distilled statistics called Tweet Cluster Vectors. Then we develop a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations. Finally, we describe a topic evolvement detection method, which consumes online and historical summaries to produce timelines automatically from tweet streams. Our experiments on large-scale real tweets demonstrate the efficiency and effectiveness of our approach.