Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Molecular feature mining in HIV data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Complete Mining of Frequent Patterns from Graphs: Mining Graph Data
Machine Learning
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Approximate Query Processing: Taming the TeraBytes
Proceedings of the 27th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Graph indexing: a frequent structure-based approach
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
SPIN: mining maximal frequent subgraphs from graph databases
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent Substructure-Based Approaches for Classifying Chemical Compounds
IEEE Transactions on Knowledge and Data Engineering
Graphs over time: densification laws, shrinking diameters and possible explanations
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
On mining cross-graph quasi-cliques
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Finding Frequent Patterns in a Large Sparse Graph*
Data Mining and Knowledge Discovery
Graph mining: Laws, generators, and algorithms
ACM Computing Surveys (CSUR)
To randomize or not to randomize: space optimal summaries for hyperlink analysis
Proceedings of the 15th international conference on World Wide Web
NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
XSKETCH synopses for XML data graphs
ACM Transactions on Database Systems (TODS)
Mining specifications of malicious behavior
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Graph summarization with bounded error
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Mining significant graph patterns by leap search
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient aggregation for graph summarization
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ORIGAMI: Mining Representative Orthogonal Graph Patterns
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Efficient Algorithms for Mining Significant Substructures in Graphs with Quality Guarantees
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Finding relevant patterns in bursty sequences
Proceedings of the VLDB Endowment
Graph OLAP: Towards Online Analytical Processing on Graphs
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Structure and attribute index for approximate graph matching in large graphs
Information Systems
Compression of weighted graphs
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
CP-index: on the efficient indexing of large graphs
Proceedings of the 20th ACM international conference on Information and knowledge management
NOVA: a novel and efficient framework for finding subgraph isomorphism mappings in large graphs
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Network compression by node and edge mergers
Bisociative Knowledge Discovery
Graph classification: a diversified discriminative feature selection approach
Proceedings of the 21st ACM international conference on Information and knowledge management
MMM-ACNS'12 Proceedings of the 6th international conference on Mathematical Methods, Models and Architectures for Computer Network Security: computer network security
Speeding up graph clustering via modular decomposition based compression
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
Graphs are prevalent in many domains such as Bioinformatics, social networks, Web and cyber-security. Graph pattern mining has become an important tool in the management and analysis of complexly structured data, where example applications include indexing, clustering and classification. Existing graph mining algorithms have achieved great success by exploiting various properties in the pattern space. Unfortunately, due to the fundamental role subgraph isomorphism plays in these methods, they may all enter into a pitfall when the cost to enumerate a huge set of isomorphic embeddings blows up, especially in large graphs. The solution we propose for this problem resorts to reduction on the data space. For each graph, we build a summary of it and mine this shrunk graph instead. Compared to other data reduction techniques that either reduce the number of transactions or compress between transactions, this new framework, called Summarize-Mine, suggests a third path by compressing within transactions. Summarize-Mine is effective in cutting down the size of graphs, thus decreasing the embedding enumeration cost. However, compression might lose patterns at the same time. We address this issue by generating randomized summaries and repeating the process for multiple rounds, where the main idea is that true patterns are unlikely to miss from all rounds. We provide strict probabilistic guarantees on pattern loss likelihood. Experiments on real malware trace data show that Summarize-Mine is very efficient, which can find interesting malware fingerprints that were not revealed previously.