Fast Discovery of Group Lag Correlations in Streams

  • Authors:
  • Yasushi Sakurai;Christos Faloutsos;Spiros Papadimitriou

  • Affiliations:
  • NTT Communication Science Laboratories;Carnegie Mellon University;IBM T.J. Watson Research Center

  • Venue:
  • ACM Transactions on Knowledge Discovery from Data (TKDD)
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The study of data streams has received considerable attention in various communities (theory, databases, data mining, networking), due to several important applications, such as network analysis, sensor monitoring, financial data analysis, and moving object tracking. Our goal in this article is to monitor multiple numerical streams and determine which pairs are correlated with lags, as well as the value of each such lag. Lag correlations and anticorrelations are frequent and very interesting in practice. For example, a decrease in interest rates typically precedes an increase in house sales by a few months; higher amounts of fluoride in drinking water may lead to fewer dental cavities some years later. Other lag settings include network analysis, sensor monitoring, financial data analysis, and tracking of moving objects. Such data streams are often correlated or anticorrelated, but with unknown lag. We propose BRAID, a method of detecting lag correlations among data streams. BRAID can handle data streams of semi-infinite length incrementally, quickly, and with small resource consumption. However, BRAID requires space and time quadratic on a number of streams k. We also propose ThinBRAID, which is even faster than BRAID, requiring O(k) space and time per time tick. Our theoretical analysis shows that BRAID/ThinBRAID can estimate lag correlations with little or, often, with no error. Our experiments on real and realistic data show that BRAID and ThinBRAID detect the correct lag perfectly most of the time (the largest relative error was about 1%), while they are significantly faster (up to 40,000 times) than the naïve implementation.