ACM SIGKDD Explorations Newsletter
Magrathea: Building and Analyzing Ubiquitous and Social Systems
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
TAGs: scalable threshold-based algorithms for proximity computation in graphs
Proceedings of the 14th International Conference on Extending Database Technology
A supervised approach to detect protein complex by combining biological and topological properties
International Journal of Data Mining and Bioinformatics
Hi-index | 0.00 |
Graphs show up in a surprisingly diverse set of disciplines, ranging from computer networks to sociology, biology, ecology and many more. How do such “normal” graphs look like? How can we spot abnormal subgraphs within them? Which nodes/edges are “suspicious?” How does a virus spread over a graph? Answering these questions is vital for outlier detection (such as terrorist cells, money laundering rings), forecasting, simulations (how well will a new protocol work on a realistic computer network?), immunization campaigns and many other applications. We attempt to answer these questions in two parts. First, we answer questions targeted at applications : what patterns/properties of a graph are important for solving specific problems? Here, we investigate the propagation behavior of a computer virus over a network, and find a simple formula for the epidemic threshold (beyond which any viral outbreak might become an epidemic). We find an “information survival threshold” which determines whether, in a sensor or P2P network with failing nodes and links, a piece of information will survive or not. We also develop a scalable, parameter-free method for finding groups of “similar” nodes in a graph, corresponding to homogeneous regions (or CrossAssociations) in the binary adjacency matrix of the graph. This can help navigate the structure of the graph, and find un-obvious patterns. In the second part of our work, we investigate recurring patterns in real-world graphs, to gain a deeper understanding of their structure. This leads to the development of the R-MAT model of graph generation for creating synthetic but “realistic” graphs, which match many of the patterns found in real-world graphs, including power-law and lognormal degree distributions, small diameter and “community” effects.