On mining cross-graph quasi-cliques

Authors:
Jian Pei;Daxin Jiang;Aidong Zhang
Affiliations:
Simon Fraser University, Canada;State University of New York at Buffalo;State University of New York at Buffalo
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 17
Cited 45

Classifying molecular sequences using a linkage graph with their pairwise similarities

Theoretical Computer Science - Special issue: Genome informatics
A clustering algorithm based on graph connectivity

Information Processing Letters
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Massive Quasi-Clique Detection

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
ANF: a fast and scalable tool for data mining in massive graphs

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-relational data mining: an introduction

ACM SIGKDD Explorations Newsletter
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fast discovery of connection subgraphs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the space of graph properties

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable mining of large disk-based graph databases

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

An effective approach to entity resolution problem using quasi-clique and its application to digital libraries
Coherent closed quasi-clique discovery from large dense graph databases

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Ortholog Clustering on a Multipartite Graph

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Out-of-core coherent closed quasi-clique mining from large dense graph databases

ACM Transactions on Database Systems (TODS)
Netprobe: a fast and scalable system for fraud detection in online auction networks

Proceedings of the 16th international conference on World Wide Web
Fast best-effort pattern matching in large attributed graphs

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximal Biclique Subgraphs and Closed Pattern Pairs of the Adjacency Matrix: A One-to-One Correspondence and Mining Algorithms

IEEE Transactions on Knowledge and Data Engineering
Community detection in large-scale social networks

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
RAM: Randomized Approximate Graph Mining

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Effective Pruning Techniques for Mining Quasi-Cliques

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Hierarchical, Parameter-Free Community Discovery

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Mining frequent cross-graph quasi-cliques

ACM Transactions on Knowledge Discovery from Data (TKDD)
On Effectively Finding Maximal Quasi-cliques in Graphs

Learning and Intelligent Optimization
Combinatorial optimization in system configuration design

Automation and Remote Control
Community detection in complex networks

Journal of Computer Science and Technology
Parallel community detection on large networks with propinquity dynamics

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
MobileMiner: a real world case study of data mining in mobile communication

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Frequent subgraph pattern mining on uncertain graph data

Proceedings of the 18th ACM conference on Information and knowledge management
Mining graph patterns efficiently via randomized summaries

Proceedings of the VLDB Endowment
Structural correlation pattern mining for large graphs

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
A case study on financial ratios via cross-graph quasi-bicliques

Information Sciences: an International Journal
DESSIN: mining dense subgraph patterns in a single graph

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
The impact of unlinkability on adversarial community detection: effects and countermeasures

PETS'10 Proceedings of the 10th international conference on Privacy enhancing technologies
On Graph-Based Name Disambiguation

Journal of Data and Information Quality (JDIQ)
On dense pattern mining in graph streams

Proceedings of the VLDB Endowment
Assessing and ranking structural correlations in graphs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
DB-CSC: a density-based approach for subspace clustering in graphs with feature vectors

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
CP-index: on the efficient indexing of large graphs

Proceedings of the 20th ACM international conference on Information and knowledge management
Mining diversity on social media networks

Multimedia Tools and Applications
Mining attribute-structure correlated patterns in large attributed graphs

Proceedings of the VLDB Endowment
EigenSpokes: surprising patterns and scalable community chipping in large graphs

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Clustering in applications with multiple data sources-A mutual subspace clustering approach

Neurocomputing
Truss decomposition in massive networks

Proceedings of the VLDB Endowment
Gateway finder in large graphs: problem definitions and fast solutions

Information Retrieval
Real time discovery of dense clusters in highly dynamic graphs: identifying real world events in highly dynamic environments

Proceedings of the VLDB Endowment
Mining coherent subgraphs in multi-layer graphs with edge labels

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
On the maximum quasi-clique problem

Discrete Applied Mathematics
Mining Web Browsing Log by Using Relaxed Biclique Enumeration Algorithm in MapReduce

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
RMiCS: a robust approach for mining coherent subgraphs in edge-labeled multi-layer graphs

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Redundancy-aware maximal cliques

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
CopyCatch: stopping group attacks by spotting lockstep behavior in social networks

Proceedings of the 22nd international conference on World Wide Web
MFMS: maximal frequent module set mining from multiple human gene expression data sets

Proceedings of the 12th International Workshop on Data Mining in Bioinformatics
Discovery of extreme events-related communities in contrasting groups of physical system networks

Data Mining and Knowledge Discovery
Horton+: a distributed system for processing declarative reachability queries over partitioned graphs

Proceedings of the VLDB Endowment
Modelling and exploring historical records to facilitate service composition

International Journal of Web and Grid Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

Joint mining of multiple data sets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in cross-market customer segmentation, a group of customers who behave similarly in multiple markets should be considered as a more coherent and more reliable cluster than clusters found in a single market. As another example, in bioinformatics, by joint mining of gene expression data and protein interaction data, we can find clusters of genes which show coherent expression patterns and also produce interacting proteins. Such clusters may be potential pathways.In this paper, we investigate a novel data mining problem, mining cross-graph quasi-cliques, which is generalized from several interesting applications such as cross-market customer segmentation and joint mining of gene expression data and protein interaction data. We build a general model for mining cross-graph quasi-cliques, show why the complete set of cross-graph quasi-cliques cannot be found by previous data mining methods, and study the complexity of the problem. While the problem is difficult, we develop an efficient algorithm, Crochet, which exploits several interesting and effective techniques and heuristics to efficaciously mine cross-graph quasi-cliques. A systematic performance study is reported on both synthetic and real data sets. We demonstrate some interesting and meaningful cross-graph quasi-cliques in bioinformatics. The experimental results also show that algorithm Crochet is efficient and scalable.