ANF: a fast and scalable tool for data mining in massive graphs

Authors:
Christopher R. Palmer;Phillip B. Gibbons;Christos Faloutsos
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Intel Research Pittsburgh, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 11
Cited 73

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
Estimating the size of generalized transitive closures

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Blocking for external graph searching

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Size-estimation framework with applications to transitive closure and reachability

Journal of Computer and System Sciences
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
On power-law relationships of the Internet topology

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
The Web as a graph

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Mining the network value of customers

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Graph-Based Data Mining

IEEE Intelligent Systems

The link prediction problem for social networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Prospects and challenges for multi-relational data mining

ACM SIGKDD Explorations Newsletter
Link mining: a new data mining challenge

ACM SIGKDD Explorations Newsletter
Approximate Aggregation Techniques for Sensor Databases

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Spatio-Temporal Aggregation Using Sketches

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Unordered Tree Mining with Applications to Phylogeny

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
The price of validity in dynamic networks

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
Synopsis diffusion for robust aggregation in sensor networks

SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
ALVIN: a system for visualizing large networks

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Tributaries and deltas: efficient and robust aggregation in sensor network streams

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Graphs over time: densification laws, shrinking diameters and possible explanations

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
On mining cross-graph quasi-cliques

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
Graph mining: Laws, generators, and algorithms

ACM Computing Surveys (CSUR)
Capital and benefit in social networks

Proceedings of the 3rd international workshop on Link discovery
To randomize or not to randomize: space optimal summaries for hyperlink analysis

Proceedings of the 15th international conference on World Wide Web
Declarative networking: language, execution and optimization

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Using structure indices for efficient approximation of network properties

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Sampling from large graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
On the structural properties of massive telecom call graphs: findings and implications

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Graph evolution: Densification and shrinking diameters

ACM Transactions on Knowledge Discovery from Data (TKDD)
The price of validity in dynamic networks

Journal of Computer and System Sciences
Visualization of large networks with min-cut plots, A-plots and R-MAT

International Journal of Human-Computer Studies
The link-prediction problem for social networks

Journal of the American Society for Information Science and Technology
Analysis of topological characteristics of huge online social networking services

Proceedings of the 16th international conference on World Wide Web
New metrics for reputation management in P2P networks

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs

IEEE Transactions on Knowledge and Data Engineering
Sampling large Internet topologies for simulation purposes

Computer Networks: The International Journal of Computer and Telecommunications Networking
Discovering frequent geometric subgraphs

Information Systems
CountTorrent: ubiquitous access to query aggregates in dynamic and mobile sensor networks

Proceedings of the 5th international conference on Embedded networked sensor systems
Link analysis for Web spam detection

ACM Transactions on the Web (TWEB)
Synopsis diffusion for robust aggregation in sensor networks

ACM Transactions on Sensor Networks (TOSN)
Mining and analysing scale-free protein protein interaction network

International Journal of Bioinformatics Research and Applications
Weighted graphs and disconnected components: patterns and a generator

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Islands in the MSN messenger buddy network

Proceedings of the 1st Workshop on Social Network Systems
Mining frequent cross-graph quasi-cliques

ACM Transactions on Knowledge Discovery from Data (TKDD)
Robust approximate aggregation in sensor data management systems

ACM Transactions on Database Systems (TODS)
GADDI: distance index based subgraph matching in biological networks

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Counting ancestors to estimate authority

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Kronecker Graphs: An Approach to Modeling Networks

The Journal of Machine Learning Research
Continuously maintaining sliding window skylines in a sensor network

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Frequent subgraph mining on a single large graph using sampling techniques

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
DESSIN: mining dense subgraph patterns in a single graph

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
HADI: Mining Radii of Large Graphs

ACM Transactions on Knowledge Discovery from Data (TKDD)
Discovery and analysis of tightly knit communities in telecom social networks

IBM Journal of Research and Development
GoDisco: selective gossip based dissemination of information in social community based overlays

ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
HyperANF: approximating the neighbourhood function of very large graphs on a budget

Proceedings of the 20th international conference on World wide web
Adversarial Web Search

Foundations and Trends in Information Retrieval
Analyzing a Korean blogosphere: a social network analysis perspective

Proceedings of the 2011 ACM Symposium on Applied Computing
A comparison of three algorithms for approximating the distance distribution in real-world graphs

TAPAS'11 Proceedings of the First international ICST conference on Theory and practice of algorithms in (computer) systems
Robustness of social networks: comparative results based on distance distributions

SocInfo'11 Proceedings of the Third international conference on Social informatics
Determining the diameter of small world networks

Proceedings of the 20th ACM international conference on Information and knowledge management
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
On the topology of the dark web of terrorist groups

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
Graph clustering based on structural similarity of fragments

Proceedings of the 2005 international conference on Federation over the Web
Mode directed path finding

ECML'05 Proceedings of the 16th European conference on Machine Learning
A scalable randomized method to compute link-based similarity rank on the web graph

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Reducing large internet topologies for faster simulations

NETWORKING'05 Proceedings of the 4th IFIP-TC6 international conference on Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communication Systems
Object link structure in the semantic web

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Injecting uncertainty in graphs for identity obfuscation

Proceedings of the VLDB Endowment
On computing the diameter of real-world directed (weighted) graphs

SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
Four degrees of separation

Proceedings of the 3rd Annual ACM Web Science Conference
Measuring robustness of complex networks under MVC attack

Proceedings of the 21st ACM international conference on Information and knowledge management
Discretionary social network data revelation with a user-centric utility guarantee

Proceedings of the 21st ACM international conference on Information and knowledge management
Ligra: a lightweight graph processing framework for shared memory

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Fast track article: GoDisco++: A gossip algorithm for information dissemination in multi-dimensional community networks

Pervasive and Mobile Computing
Four Degrees of Separation, Really

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
LR-PPR: locality-sensitive, re-use promoting, approximate personalized pagerank computation

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Scalable similarity estimation in social networks: closeness, node labels, and random edge lengths

Proceedings of the first ACM conference on Online social networks
Call me maybe: understanding nature and risks of sharing mobile numbers on online social networks

Proceedings of the first ACM conference on Online social networks
On computing the diameter of real-world undirected graphs

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can be represented as a graph. This work presents a data mining tool, called ANF, that can quickly answer a number of interesting questions on graph-represented data, such as the following. How robust is the Internet to failures? What are the most influential database papers? Are there gender differences in movie appearance patterns? At its core, ANF is based on a fast and memory-efficient approach for approximating the complete "neighbourhood function" for a graph. For the Internet graph (268K nodes), ANF's highly-accurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drill-down tasks and to repeatedly answer questions about changing data sources. To enable this drill-down, ANF employs new techniques for approximating neighbourhood-type functions for graphs with distinguished nodes and/or edges. When compared to the best existing approximation, ANF's approach is both faster and more accurate, given the same resources. Additionally, unlike previous approaches, ANF scales gracefully to handle disk resident graphs. Finally, we present some of our results from mining large graphs using ANF.