Mining graph patterns efficiently via randomized summaries

Authors:
Chen Chen;Cindy X. Lin;Matt Fredrikson;Mihai Christodorescu;Xifeng Yan;Jiawei Han
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Wisconsin at Madison;IBM T. J. Watson Research Center;University of California at Santa Barbara;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 27
Cited 8

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Molecular feature mining in HIV data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Complete Mining of Frequent Patterns from Graphs: Mining Graph Data

Machine Learning
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Approximate Query Processing: Taming the TeraBytes

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
SPIN: mining maximal frequent subgraphs from graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent Substructure-Based Approaches for Classifying Chemical Compounds

IEEE Transactions on Knowledge and Data Engineering
Graphs over time: densification laws, shrinking diameters and possible explanations

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
On mining cross-graph quasi-cliques

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Finding Frequent Patterns in a Large Sparse Graph*

Data Mining and Knowledge Discovery
Graph mining: Laws, generators, and algorithms

ACM Computing Surveys (CSUR)
To randomize or not to randomize: space optimal summaries for hyperlink analysis

Proceedings of the 15th international conference on World Wide Web
NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
XSKETCH synopses for XML data graphs

ACM Transactions on Database Systems (TODS)
Mining specifications of malicious behavior

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Graph summarization with bounded error

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Mining significant graph patterns by leap search

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient aggregation for graph summarization

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ORIGAMI: Mining Representative Orthogonal Graph Patterns

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Efficient Algorithms for Mining Significant Substructures in Graphs with Quality Guarantees

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Finding relevant patterns in bursty sequences

Proceedings of the VLDB Endowment
Graph OLAP: Towards Online Analytical Processing on Graphs

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining

Structure and attribute index for approximate graph matching in large graphs

Information Systems
Compression of weighted graphs

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
CP-index: on the efficient indexing of large graphs

Proceedings of the 20th ACM international conference on Information and knowledge management
NOVA: a novel and efficient framework for finding subgraph isomorphism mappings in large graphs

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Network compression by node and edge mergers

Bisociative Knowledge Discovery
Graph classification: a diversified discriminative feature selection approach

Proceedings of the 21st ACM international conference on Information and knowledge management
Using behavioral modeling and customized normalcy profiles as protection against targeted cyber-attacks

MMM-ACNS'12 Proceedings of the 6th international conference on Mathematical Methods, Models and Architectures for Computer Network Security: computer network security
Speeding up graph clustering via modular decomposition based compression

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphs are prevalent in many domains such as Bioinformatics, social networks, Web and cyber-security. Graph pattern mining has become an important tool in the management and analysis of complexly structured data, where example applications include indexing, clustering and classification. Existing graph mining algorithms have achieved great success by exploiting various properties in the pattern space. Unfortunately, due to the fundamental role subgraph isomorphism plays in these methods, they may all enter into a pitfall when the cost to enumerate a huge set of isomorphic embeddings blows up, especially in large graphs. The solution we propose for this problem resorts to reduction on the data space. For each graph, we build a summary of it and mine this shrunk graph instead. Compared to other data reduction techniques that either reduce the number of transactions or compress between transactions, this new framework, called Summarize-Mine, suggests a third path by compressing within transactions. Summarize-Mine is effective in cutting down the size of graphs, thus decreasing the embedding enumeration cost. However, compression might lose patterns at the same time. We address this issue by generating randomized summaries and repeating the process for multiple rounds, where the main idea is that true patterns are unlikely to miss from all rounds. We provide strict probabilistic guarantees on pattern loss likelihood. Experiments on real malware trace data show that Summarize-Mine is very efficient, which can find interesting malware fingerprints that were not revealed previously.