Efficient network flow based min-cut balanced partitioning
ICCAD '94 Proceedings of the 1994 IEEE/ACM international conference on Computer-aided design
Fast subsequence matching in time-series databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Similarity-based queries for time series data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The node capacitated graph partitioning problem: a computational study
Mathematical Programming: Series A and B - Special issue on computational integer programming
Fast Approximate Graph Partitioning Algorithms
SIAM Journal on Computing
Multi-way partitioning using bi-partition heuristics
ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
On computing correlated aggregates over continual data streams
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Locally adaptive dimensionality reduction for indexing large time series databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
HierarchyScan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
A linear-time heuristic for improving network partitions
DAC '82 Proceedings of the 19th Design Automation Conference
Identifying similarities, periodicities and bursts for online search queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
BRAID: stream mining through group lag correlations
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Fast window correlations over uncooperative time series
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Streaming pattern discovery in multiple time-series
VLDB '05 Proceedings of the 31st international conference on Very large data bases
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient Similarity Search over Future Stream Time Series
IEEE Transactions on Knowledge and Data Engineering
Multiscale Representations for Fast Pattern Matching in Stream Time Series
IEEE Transactions on Knowledge and Data Engineering
Managing massive time series streams with multi-scale compressed trickles
Proceedings of the VLDB Endowment
Approximate similarity search over multiple stream time series
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
DataGarage: warehousing massive performance data on commodity servers
Proceedings of the VLDB Endowment
Logical-shapelets: an expressive primitive for time series classification
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Simple and practical algorithm for sparse Fourier transform
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Continuously identifying representatives out of massive streams
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Nearly optimal sparse fourier transform
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
CGStream: continuous correlated graph query for data streams
Proceedings of the 21st ACM international conference on Information and knowledge management
Proceedings of the 14th Workshop on Mobile Computing Systems and Applications
Efficient sentiment correlation for large-scale demographics
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Pattern discovery in data streams under the time warping distance
The VLDB Journal — The International Journal on Very Large Data Bases
Local correlation detection with linearity enhancement in streaming data
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Discovering longest-lasting correlation in sequence databases
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
We consider the problem of computing all-pair correlations in a warehouse containing a large number (e.g., tens of thousands) of time-series (or, signals). The problem arises in automatic discovery of patterns and anomalies in data intensive applications such as data center management, environmental monitoring, and scientific experiments. However, with existing techniques, solving the problem for a large stream warehouse is extremely expensive, due to the problem's inherent quadratic I/O and CPU complexities. We propose novel algorithms, based on Discrete Fourier Transformation (DFT) and graph partitioning, to reduce the end-to-end response time of an all-pair correlation query. To minimize I/O cost, we partition a massive set of input signals into smaller batches such that caching the signals one batch at a time maximizes data reuse and minimizes disk I/O. To reduce CPU cost, we propose two approximation algorithms. Our first algorithm efficiently computes approximate correlation coefficients of similar signal pairs within a given error bound. The second algorithm efficiently identifies, without any false positives or negatives, all signal pairs with correlations above a given threshold. For many real applications, our approximate solutions are as useful as corresponding exact solutions, due to our strict error guarantees. However, compared to the state-of-the-art exact algorithms, our algorithms are up to 17x faster for several real datasets.