Mining frequent cross-graph quasi-cliques

Authors:
Daxin Jiang;Jian Pei
Affiliations:
Microsoft Research Asia, Beijing, China;Simon Fraser University, Burnaby, BC Canada
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2009

Citing 17
Cited 9

Classifying molecular sequences using a linkage graph with their pairwise similarities

Theoretical Computer Science - Special issue: Genome informatics
A clustering algorithm based on graph connectivity

Information Processing Letters
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Massive Quasi-Clique Detection

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
ANF: a fast and scalable tool for data mining in massive graphs

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-relational data mining: an introduction

ACM SIGKDD Explorations Newsletter
Biological applications of multi-relational data mining

ACM SIGKDD Explorations Newsletter
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fast discovery of connection subgraphs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the space of graph properties

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable mining of large disk-based graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On mining cross-graph quasi-cliques

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

A lower bound on the sample size needed to perform a significant frequent pattern mining task

Pattern Recognition Letters
Multi-way set enumeration in real-valued tensors

Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors
Discovering Relevant Cross-Graph Cliques in Dynamic Networks

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Structural correlation pattern mining for large graphs

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
A case study on financial ratios via cross-graph quasi-bicliques

Information Sciences: an International Journal
Ranking individuals and groups by influence propagation

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Mining attribute-structure correlated patterns in large attributed graphs

Proceedings of the VLDB Endowment
Closed and noise-tolerant patterns in n-ary relations

Data Mining and Knowledge Discovery
MFMS: maximal frequent module set mining from multiple human gene expression data sets

Proceedings of the 12th International Workshop on Data Mining in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Joint mining of multiple datasets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in bioinformatics, jointly mining multiple gene expression datasets obtained by different labs or during various biological processes may overcome the heavy noise in the data. Moreover, by joint mining of gene expression data and protein-protein interaction data, we may discover clusters of genes which show coherent expression patterns and also produce interacting proteins. Such clusters may be potential pathways. In this article, we investigate a novel data mining problem, mining frequent cross-graph quasi-cliques, which is generalized from several interesting applications in bioinformatics, cross-market customer segmentation, social network analysis, and Web mining. In a graph, a set of vertices S is a γ-quasi-clique (0 v in S directly connects to at least γ ⋅ (|S| − 1) other vertices in S. Given a set of graphs G1, …, Gn and parameter min_sup (0 min_sup ≤ 1), a set of vertices S is a frequent cross-graph quasi-clique if S is a γ-quasi-clique in at least min_sup ⋅ n graphs, and there does not exist a proper superset of S having the property. We build a general model, show why the complete set of frequent cross-graph quasi-cliques cannot be found by previous data mining methods, and study the complexity of the problem. While the problem is difficult, we develop practical algorithms which exploit several interesting and effective techniques and heuristics to efficaciously mine frequent cross-graph quasi-cliques. A systematic performance study is reported on both synthetic and real data sets. We demonstrate some interesting and meaningful frequent cross-graph quasi-cliques in bioinformatics. The experimental results also show that our algorithms are efficient and scalable.