Classifying molecular sequences using a linkage graph with their pairwise similarities
Theoretical Computer Science - Special issue: Genome informatics
A clustering algorithm based on graph connectivity
Information Processing Letters
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Massive Quasi-Clique Detection
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
ANF: a fast and scalable tool for data mining in massive graphs
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-relational data mining: an introduction
ACM SIGKDD Explorations Newsletter
Biological applications of multi-relational data mining
ACM SIGKDD Explorations Newsletter
Graph indexing: a frequent structure-based approach
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fast discovery of connection subgraphs
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the space of graph properties
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable mining of large disk-based graph databases
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On mining cross-graph quasi-cliques
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A lower bound on the sample size needed to perform a significant frequent pattern mining task
Pattern Recognition Letters
Multi-way set enumeration in real-valued tensors
Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors
Discovering Relevant Cross-Graph Cliques in Dynamic Networks
ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
Structural correlation pattern mining for large graphs
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
A case study on financial ratios via cross-graph quasi-bicliques
Information Sciences: an International Journal
Ranking individuals and groups by influence propagation
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Mining attribute-structure correlated patterns in large attributed graphs
Proceedings of the VLDB Endowment
Closed and noise-tolerant patterns in n-ary relations
Data Mining and Knowledge Discovery
MFMS: maximal frequent module set mining from multiple human gene expression data sets
Proceedings of the 12th International Workshop on Data Mining in Bioinformatics
Hi-index | 0.00 |
Joint mining of multiple datasets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in bioinformatics, jointly mining multiple gene expression datasets obtained by different labs or during various biological processes may overcome the heavy noise in the data. Moreover, by joint mining of gene expression data and protein-protein interaction data, we may discover clusters of genes which show coherent expression patterns and also produce interacting proteins. Such clusters may be potential pathways. In this article, we investigate a novel data mining problem, mining frequent cross-graph quasi-cliques, which is generalized from several interesting applications in bioinformatics, cross-market customer segmentation, social network analysis, and Web mining. In a graph, a set of vertices S is a γ-quasi-clique (0 v in S directly connects to at least γ ⋅ (|S| − 1) other vertices in S. Given a set of graphs G1, …, Gn and parameter min_sup (0 min_sup ≤ 1), a set of vertices S is a frequent cross-graph quasi-clique if S is a γ-quasi-clique in at least min_sup ⋅ n graphs, and there does not exist a proper superset of S having the property. We build a general model, show why the complete set of frequent cross-graph quasi-cliques cannot be found by previous data mining methods, and study the complexity of the problem. While the problem is difficult, we develop practical algorithms which exploit several interesting and effective techniques and heuristics to efficaciously mine frequent cross-graph quasi-cliques. A systematic performance study is reported on both synthetic and real data sets. We demonstrate some interesting and meaningful frequent cross-graph quasi-cliques in bioinformatics. The experimental results also show that our algorithms are efficient and scalable.