Fast window correlations over uncooperative time series

Authors:
Richard Cole;Dennis Shasha;Xiaojian Zhao
Affiliations:
New York University;New York University;New York University
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 24
Cited 14

Elements of information theory

Elements of information theory
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Similarity-based queries for time series data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently supporting ad hoc queries in large datasets of time sequences

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient search for approximate nearest neighbor in high dimensional spaces

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Random sampling techniques for space efficient online computation of order statistics of large datasets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A comparison of DFT and DWT based similarity search in time-series databases

Proceedings of the ninth international conference on Information and knowledge management
Database-friendly random projections

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-efficient online computation of quantile summaries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
The Combinatorial Design Approach to Automatic Test Generation

IEEE Software
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
HierarchyScan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Identifying Representative Trends in Massive Time Series Data Sets Using Sketches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Stable distributions, pseudorandom generators, embeddings and data stream computation

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Indexing multi-dimensional time-series with support for multiple distance measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Online Amnesic Approximation of Streaming Time Series

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
High Performance Discovery In Time Series: Techniques And Case Studies (Monographs in Computer Science)

High Performance Discovery In Time Series: Techniques And Case Studies (Monographs in Computer Science)
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Exact indexing of dynamic time warping

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Warping the time on data streams

Data & Knowledge Engineering
Collaborative data gathering in wireless sensor networks using measurement co-occurrence

Computer Communications
Flexible least squares for temporal data mining and statistical arbitrage

Expert Systems with Applications: An International Journal
Managing massive time series streams with multi-scale compressed trickles

Proceedings of the VLDB Endowment
On privacy in time series data mining

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Fast approximate correlation for massive time-series data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
MG-join: detecting phenomena and their correlation in high dimensional data streams

Distributed and Parallel Databases
A new class of attacks on time series data mining\m{1}

Intelligent Data Analysis
Fast Discovery of Group Lag Correlations in Streams

ACM Transactions on Knowledge Discovery from Data (TKDD)
A review on time series data mining

Engineering Applications of Artificial Intelligence
Preserving Privacy in Time Series Data Mining

International Journal of Data Warehousing and Mining
Efficient sentiment correlation for large-scale demographics

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Discovering longest-lasting correlation in sequence databases

Proceedings of the VLDB Endowment
On clustering large number of data streams

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data arriving in time order (a data stream) arises in fields including physics, finance, medicine, and music, to name a few. Often the data comes from sensors (in physics and medicine for example) whose data rates continue to improve dramatically as sensor technology improves. Further, the number of sensors is increasing, so correlating data between sensors becomes ever more critical in order to distill knowlege from the data. In many applications such as finance, recent correlations are of far more interest than long-term correlation, so correlation over sliding windows (windowed correlation) is the desired operation. Fast response is desirable in many applications (e.g., to aim a telescope at an activity of interest or to perform a stock trade). These three factors -- data size, windowed correlation, and fast response -- motivate this work.Previous work [10, 14] showed how to compute Pearson correlation using Fast Fourier Transforms and Wavelet transforms, but such techniques don't work for time series in which the energy is spread over many frequency components, thus resembling white noise. For such "uncooperative" time series, this paper shows how to combine several simple techniques -- sketches (random projections), convolution, structured random vectors, grid structures, and combinatorial design -- to achieve high performance windowed Pearson correlation over a variety of data sets.