Correlation search in graph databases

Authors:
Yiping Ke;James Cheng;Wilfred Ng
Affiliations:
Hong Kong University of Science and Technology;Hong Kong University of Science and Technology;Hong Kong University of Science and Technology
Venue:
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2007

Citing 17
Cited 19

Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Alternative Interest Measures for Mining Associations in Databases

IEEE Transactions on Knowledge and Data Engineering
Mining Mutually Dependent Patterns

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Frequent Subgraph Discovery

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The complexity of theorem-proving procedures

STOC '71 Proceedings of the third annual ACM symposium on Theory of computing
gSpan: Graph-Based Substructure Pattern Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A quickstart in frequent structure mining can make a difference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic multimedia cross-modal correlation discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
AutoLag: Automatic Discovery of Lag Correlations in Stream Data

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Graph indexing based on discriminative frequent structure analysis

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases

IEEE Transactions on Knowledge and Data Engineering
Mining quantitative correlated patterns using an information-theoretic approach

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Hyperclique pattern discovery

Data Mining and Knowledge Discovery
Finding highly correlated pairs efficiently with powerful pruning

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Feature-based similarity search in graph structures

ACM Transactions on Database Systems (TODS)
Fg-index: towards verification-free query processing on graph databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data

A novel spectral coding in a large graph database

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Correlated pattern mining in quantitative databases

ACM Transactions on Database Systems (TODS)
Efficient query processing on graph databases

ACM Transactions on Database Systems (TODS)
GADDI: distance index based subgraph matching in biological networks

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Top-K Correlation Sub-graph Search in Graph Databases

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Event Correlations in Sensor Networks

ICCS 2009 Proceedings of the 9th International Conference on Computational Science
Discovery of Correlated Sequential Subgraphs from a Sequence of Graphs

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Frequent subgraph pattern mining on uncertain graph data

Proceedings of the 18th ACM conference on Information and knowledge management
gRegress: extracting features from graph transactions for regression

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Mining correlated subgraphs in graph databases

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
GPD: A Graph Pattern Diffusion Kernel for Accurate Graph Classification with Applications in Cheminformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Summarization graph indexing: beyond frequent structure-based approach

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
1-D fast normalized cross-correlation using additions

Digital Signal Processing
MARGIN: Maximal frequent subgraph mining

ACM Transactions on Knowledge Discovery from Data (TKDD)
Structure and attribute index for approximate graph matching in large graphs

Information Systems
Querying large graph databases

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Top-N minimization approach for indicative correlation change mining

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
CGStream: continuous correlated graph query for data streams

Proceedings of the 21st ACM international conference on Information and knowledge management
Continuous top-k query for graph streams

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Correlation mining has gained great success in many application domains for its ability to capture the underlying dependency between objects. However, the research of correlation mining from graph databases is still lacking despite the fact that graph data, especially in various scientific domains, proliferate in recent years. In this paper, we propose a new problem of correlation mining from graph databases, called Correlated Graph Search (CGS). CGS adopts Pearson's correlation coefficient as a correlation measure to take into consideration the occurrence distributions of graphs. However, the problem poses significant challenges, since every subgraph of a graph in the database is a candidate but the number of subgraphs is exponential. We derive two necessary conditions which set bounds on the occurrence probability of a candidate in the database. With this result, we design an efficient algorithm that operates on a much smaller projected database and thus we are able to obtain a significantly smaller set of candidates. To further improve the efficiency, we develop three heuristic rules and apply them on the candidate set to further reduce the search space. Our extensive experiments demonstrate the effectiveness of our method on candidate reduction. The results also justify the efficiency of our algorithm in mining correlations from large real and synthetic datasets.