Finding Frequent Patterns in a Large Sparse Graph*

Authors:
Michihiro Kuramochi;George Karypis
Affiliations:
Department of Computer Science & Engineering, University of Minnesota, Minneapolis, USA 55455;Department of Computer Science & Engineering, University of Minnesota, Minneapolis, USA 55455
Venue:
Data Mining and Knowledge Discovery
Year:
2005

Citing 40
Cited 45

Approximating clique is almost NP-complete (preliminary version)

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
CLIP: concept learning from inference patterns

Artificial Intelligence - Special issue: AI research in Japan
Knowledge discovery from structural data

Journal of Intelligent Information Systems
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast detection of common geometric substructure in proteins

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Scientific knowledge discovery using inductive logic programming

Communications of the ACM
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A framework for constructing features and models for intrusion detection systems

ACM Transactions on Information and System Security (TISSEC)
Molecular feature mining in HIV data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Complete Mining of Frequent Patterns from Graphs: Mining Graph Data

Machine Learning
Finding Patterns in Three-Dimensional Graphs: Algorithms and Applications to Scientific Data Mining

IEEE Transactions on Knowledge and Data Engineering
Graph-Based Data Mining

IEEE Intelligent Systems
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Scalable Algorithm for Clustering Sequential Data

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Multiple Structural Alignment and Core Detection by Geometric Hashing

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
On the Approximation Properties of Independent Set Problem in Degree 3 Graphs

WADS '95 Proceedings of the 4th International Workshop on Algorithms and Data Structures
Towards Semantic Web Mining

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
SEuS: Structure Extraction Using Summaries

DS '02 Proceedings of the 5th International Conference on Discovery Science
Carcinogenesis Predictions Using ILP

ILP '97 Proceedings of the 7th International Workshop on Inductive Logic Programming
A fast algorithm for the maximum clique problem

Discrete Applied Mathematics - Sixth Twente Workshop on Graphs and Combinatorial Optimization
ANF: a fast and scalable tool for data mining in massive graphs

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Molecular Fragments: Finding Relevant Substructures of Molecules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Computing Frequent Graph Patterns from Semistructured Data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Logic Induction of Valid Behavior Specifications for Intrusion Detection

SP '00 Proceedings of the 2000 IEEE Symposium on Security and Privacy
Graph-based hierarchical conceptual clustering

The Journal of Machine Learning Research
Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Diagonally Subgraphs Pattern Mining

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
An Efficient Algorithm for Discovering Frequent Subgraphs

IEEE Transactions on Knowledge and Data Engineering
The levelwise version space algorithm and its application to molecular fragment finding

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
An efficient algorithm of frequent connected subgraph extraction

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
The web as a graph: measurements, models, and methods

COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics

Subdue: compression-based frequent pattern discovery in graph data

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
Graph-Based Procedural Abstraction

Proceedings of the International Symposium on Code Generation and Optimization
Fast best-effort pattern matching in large attributed graphs

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph-Based Analysis of Human Transfer Learning Using a Game Testbed

IEEE Transactions on Knowledge and Data Engineering
Top-k subgraph matching query in a large graph

Proceedings of the ACM first Ph.D. workshop in CIKM
Efficient mining of frequent XML query patterns with repeating-siblings

Information and Software Technology
RAM: Randomized Approximate Graph Mining

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
GADDI: distance index based subgraph matching in biological networks

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
FOGGER: an algorithm for graph generator discovery

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Accurate and Efficient Structural Characteristic Feature Extraction for Clone Detection

FASE '09 Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Complete and accurate clone detection in graph-based models

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Mining globally distributed frequent subgraphs in a single labeled graph

Data & Knowledge Engineering
MapReduce-Based Pattern Finding Algorithm Applied in Motif Detection for Prescription Compatibility Network

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Mining Graph Evolution Rules

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Mining graph patterns efficiently via randomized summaries

Proceedings of the VLDB Endowment
k-automorphism: a general framework for privacy preserving network publication

Proceedings of the VLDB Endowment
Mining correlated subgraphs in graph databases

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
What is frequent in a single graph?

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Frequent subgraph mining on a single large graph using sampling techniques

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
MARGIN: Maximal frequent subgraph mining

ACM Transactions on Knowledge Discovery from Data (TKDD)
DESSIN: mining dense subgraph patterns in a single graph

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
DSI: a method for indexing large graphs using distance set

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Improving constrained pattern mining with first-fail-based heuristics

Data Mining and Knowledge Discovery
Learning graph prototypes for shape recognition

Computer Vision and Image Understanding
A log-linear approach to mining significant graph-relational patterns

Data & Knowledge Engineering
All normalized anti-monotonic overlap graph measures are bounded

Data Mining and Knowledge Discovery
CP-index: on the efficient indexing of large graphs

Proceedings of the 20th ACM international conference on Information and knowledge management
Visual pattern discovery for architecture image classification and product image search

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
An integer linear program for substitution-tolerant subgraph isomorphism and its use for symbol spotting in technical drawings

Pattern Recognition
BiQL: a query language for analyzing information networks

Bisociative Knowledge Discovery
Review of bisonet abstraction techniques

Bisociative Knowledge Discovery
An iterative MapReduce approach to frequent subgraph mining in biological datasets

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Graph grammar induction as a parser-controlled heuristic search process

AGTIVE'11 Proceedings of the 4th international conference on Applications of Graph Transformations with Industrial Relevance
An efficiently computable support measure for frequent subgraph pattern mining

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Mining dense structures to uncover anomalous behaviour in financial network data

MSM'11 Proceedings of the 2011 international conference on Modeling and Mining Ubiquitous Social Media
Inexact subgraph isomorphism in MapReduce

Journal of Parallel and Distributed Computing
Social-Based Conceptual Links: Conceptual Analysis Applied to Social Networks

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
OO-FSG: an object-oriented approach to mine frequent subgraphs

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Approximate graph mining with label costs

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent conceptual links and link-based clustering: a comparative analysis of two clustering techniques

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Mining frequent neighborhood patterns in a large labeled graph

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
From Frequent Features to Frequent Social Links

International Journal of Information System Modeling and Design
Frequent subgraph summarization with error control

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Weighted path as a condensed pattern in a single attributed DAG

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Modelling and exploring historical records to facilitate service composition

International Journal of Web and Grid Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph-based modeling has emerged as a powerful abstraction capable of capturing in a single and unified framework many of the relational, spatial, topological, and other characteristics that are present in a variety of datasets and application areas. Computationally efficient algorithms that find patterns corresponding to frequently occurring subgraphs play an important role in developing data mining-driven methodologies for analyzing the graphs resulting from such datasets. This paper presents two algorithms, based on the horizontal and vertical pattern discovery paradigms, that find the connected subgraphs that have a sufficient number of edge-disjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods for determining the number of edge-disjoint embeddings of a subgraph and employ novel algorithms for candidate generation and frequency counting, which allow them to operate on datasets with different characteristics and to quickly prune unpromising subgraphs. Experimental evaluation on real datasets from various domains show that both algorithms achieve good performance, scale well to sparse input graphs with more than 120,000 vertices or 110,000 edges, and significantly outperform previously developed algorithms.