Dense subgraph maintenance under streaming edge weight updates for real-time story identification

Authors:
Albert Angel;Nikos Sarkas;Nick Koudas;Divesh Srivastava
Affiliations:
University of Toronto;University of Toronto;University of Toronto;AT&T Labs-Research
Venue:
Proceedings of the VLDB Endowment
Year:
2012

Citing 24
Cited 5

Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Reductions in streaming algorithms, with an application to counting triangles in graphs

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Incremental Clustering for Mining in a Data Warehousing Environment

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Massive Quasi-Clique Detection

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Finding a Maximum Density Subgraph

Finding a Maximum Density Subgraph
Finding All Maximal Cliques in Dynamic Graphs

Computational Optimization and Applications
Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Evolutionary clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental flow

Networks
Seeking stable clusters in the blogosphere

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
CSV: visualizing and mining cohesive subgraphs

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Neighbor-based pattern detection for windows over streaming data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
What's on the grapevine?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
On Finding Dense Subgraphs

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
CHRONICLE: A Two-Stage Density-Based Clustering Algorithm for Dynamic Networks

DS '09 Proceedings of the 12th International Conference on Discovery Science
An Efficient Algorithm for Solving Pseudo Clique Enumeration Problem

Algorithmica - Special Issue: Algorithms and Computation; Guest Editor: Takeshi Tokuyama
TwitterMonitor: trend detection over the twitter stream

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ODES

Bioinformatics
Efficient diversity-aware search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
An incremental model for combinatorial maximization problems

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Dense subgraph maintenance under streaming edge weight updates for real-time story identification

Proceedings of the VLDB Endowment

Dense subgraph maintenance under streaming edge weight updates for real-time story identification

Proceedings of the VLDB Endowment
Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient processing of streaming graphs for evolution-aware clustering

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
FENNEL: streaming graph partitioning for massive scale graphs

Proceedings of the 7th ACM international conference on Web search and data mining
Novel document detection for massive data streams using distributed dictionary learning

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent years have witnessed an unprecedented proliferation of social media. People around the globe author, every day, millions of blog posts, micro-blog posts, social network status updates, etc. This rich stream of information can be used to identify, on an ongoing basis, emerging stories, and events that capture popular attention. Stories can be identified via groups of tightly-coupled real-world entities, namely the people, locations, products, etc., that are involved in the story. The sheer scale, and rapid evolution of the data involved necessitate highly efficient techniques for identifying important stories at every point of time. The main challenge in real-time story identification is the maintenance of dense subgraphs (corresponding to groups of tightly-coupled entities) under streaming edge weight updates (resulting from a stream of user-generated content). This is the first work to study the efficient maintenance of dense subgraphs under such streaming edge weight updates. For a wide range of definitions of density, we derive theoretical results regarding the magnitude of change that a single edge weight update can cause. Based on these, we propose a novel algorithm, DynDens, which outperforms adaptations of existing techniques to this setting, and yields meaningful results. Our approach is validated by a thorough experimental evaluation on large-scale real and synthetic datasets.