Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Modern Information Retrieval
Mining Mutually Dependent Patterns
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
The complexity of theorem-proving procedures
STOC '71 Proceedings of the third annual ACM symposium on Theory of computing
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining and Knowledge Discovery
Correlation search in graph databases
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Correlated pattern mining in quantitative databases
ACM Transactions on Database Systems (TODS)
Volatile correlation computation: a checkpoint view
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Continuous Subgraph Pattern Search over Graph Streams
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Efficient Discovery of Frequent Correlated Subgraph Pairs
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Fast approximate correlation for massive time-series data
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Mining frequent closed graphs on evolving data streams
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
In this paper, we propose to query correlated graph in a data stream scenario, where given a query graph q an algorithm is required to retrieve all the subgraphs whose Pearson's correlation coefficients with q are greater than a threshold Θ over some graph data flowing in a stream fashion. Due to the dynamic changing nature of the stream data and the inherent complexity of the graph query process, treating graph streams as static datasets is computationally infeasible or ineffective. In the paper, we propose a novel algorithm, CGStream, to identify correlated graphs from data stream, by using a sliding window which covers a number of consecutive batches of stream data records. Our theme is to regard stream query as the traversing along a data stream and the query is achieved at a number of outlooks over the data stream. For each outlook, we derive a lower frequency bound to mine a set of frequent subgraph candidates, where the lower bound guarantees that no pattern is missing from the current outlook to the next outlook. On top of that, we derive an upper correlation bound and a heuristic rule to prune the candidate size, which helps reduce the computation cost at each outlook. Experimental results demonstrate that the proposed algorithm is several times, or even an order of magnitude, more efficient than the straightforward algorithm. Meanwhile, our algorithm achieves good performance in terms of query precision.