The Journal of Machine Learning Research
Online Amnesic Approximation of Streaming Time Series
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Mining compressed frequent-pattern sets
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Summarizing itemset patterns using probabilistic models
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
An event-based framework for characterizing the evolutionary behavior of interaction graphs
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent items in a stream using flexible windows
Intelligent Data Analysis - Knowledge Discovery from Data Streams
StreamKrimp: Detecting Change in Data Streams
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Meme-tracking and the dynamics of the news cycle
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Patterns of temporal variation in online media
Proceedings of the fourth ACM international conference on Web search and data mining
Smoothing techniques for adaptive online language models: topic tracking in tweet streams
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting insights from social media with large-scale matrix approximations
IBM Journal of Research and Development
Hierarchical clustering in improving microblog stream summarization
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Sumblr: continuous summarization of evolving tweet streams
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A Twitter-based smoking cessation recruitment system
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 0.00 |
The firehose of data generated by users on social networking and microblogging sites such as Facebook and Twitter is enormous. Real-time analytics on such data is challenging with most current efforts largely focusing on the efficient querying and retrieval of data produced recently. In this paper, we present a dynamic pattern driven approach to summarize data produced by Twitter feeds. We develop a novel approach to maintain an in-memory summary while retaining sufficient information to facilitate a range of user-specific and topic-specific temporal analytics. We empirically compare our approach with several state-of-the-art pattern summarization approaches along the axes of storage cost, query accuracy, query flexibility, and efficiency using real data from Twitter. We find that the proposed approach is not only scalable but also outperforms existing approaches by a large margin.