Introduction to algorithms
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
Foundations of statistical natural language processing
Foundations of statistical natural language processing
External-memory graph algorithms
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
On external memory graph traversal
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Machine Learning
Correlation clustering with a fixed number of clusters
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
BlogScope: a system for online analysis of high volume text streams
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Building structured web community portals: a top-down, compositional, and incremental approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Ad-hoc aggregations of ranked lists in the presence of hierarchies
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Efficient identification of starters and followers in social media
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
An online blog reading system by topic clustering and personalized ranking
ACM Transactions on Internet Technology (TOIT)
Chinese Blog Clustering by Hidden Sentiment Factors
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Online Evaluation of Patterns from Evolving Web Data Streams
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
WisColl: Collective wisdom based blog clustering
Information Sciences: an International Journal
Measure-driven keyword-query expansion
Proceedings of the VLDB Endowment
A recall-based cluster formation game in peer-to-peer systems
Proceedings of the VLDB Endowment
A particle-and-density based evolutionary clustering method for dynamic networks
Proceedings of the VLDB Endowment
Framework for evaluating clustering algorithms in duplicate detection
Proceedings of the VLDB Endowment
CHRONICLE: A Two-Stage Density-Based Clustering Algorithm for Dynamic Networks
DS '09 Proceedings of the 12th International Conference on Discovery Science
Early online identification of attention gathering items in social media
Proceedings of the third ACM international conference on Web search and data mining
Durable top-k search in document archives
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Identifying topic experts and topic communities in the blogspace
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Discovering burst areas in fast evolving graphs
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Fires on the web: towards efficient exploring historical web graphs
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Bursty event detection from collaborative tags
World Wide Web
Dense subgraph maintenance under streaming edge weight updates for real-time story identification
Proceedings of the VLDB Endowment
Community detection via heterogeneous interaction analysis
Data Mining and Knowledge Discovery
A novel approach for clustering sentiments in Chinese blogs based on graph similarity
Computers & Mathematics with Applications
Proceedings of the VLDB Endowment
Extracting news blog hot topics based on the W2T Methodology
World Wide Web
Hi-index | 0.00 |
The popularity of blogs has been increasing dramatically over the last couple of years. As topics evolve in the blogosphere, keywords align together and form the heart of various stories. Intuitively we expect that in certain contexts, when there is a lot of discussion on a specific topic or event, a set of keywords will be correlated: the keywords in the set will frequently appear together (pair-wise or in conjunction) forming a cluster. Note that such keyword clusters are temporal (associated with specific time periods) and transient. As topics recede, associated keyword clusters dissolve, because their keywords no longer appear frequently together. In this paper, we formalize this intuition and present efficient algorithms to identify keyword clusters in large collections of blog posts for specific temporal intervals. We then formalize problems related to the temporal properties of such clusters. In particular, we present efficient algorithms to identify clusters that persist over time. Given the vast amounts of data involved, we present algorithms that are fast (can efficiently process millions of blogs with multiple millions of posts) and take special care to make them efficiently realizable in secondary storage. Although we instantiate our techniques in the context of blogs, our methodology is generic enough to apply equally well to any temporally ordered text source. We present the results of an experimental study using both real and synthetic data sets, demonstrating the efficiency of our algorithms, both in terms of performance and in terms of the quality of the keyword clusters and associated temporal properties we identify.