MG-join: detecting phenomena and their correlation in high dimensional data streams

  • Authors:
  • Ibrahim Kamel;Zaher Aghbari;Thuraya Awad

  • Affiliations:
  • Dept of Electrical and Computer Engineering, University of Sharjah, Sharjah, United Arab Emirates;Dept of Computer Science, University of Sharjah, Sharjah, United Arab Emirates;Dept of Computer Science, University of Sharjah, Sharjah, United Arab Emirates

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A phenomenon appears in a sensor network when a group of sensors continuously produces similar readings (i.e., data streams) over a period of time. This involves the processing of hundreds and maybe thousands of data streams in real-time. This paper focuses on detecting environmental phenomena and determining possible correlation between such phenomena.This paper proposes an efficient scheme for a detecting and tracking phenomena, e.g., air pollution and oil spills. To achieve fast online response, the proposed algorithms use a Discrete Fourier Transformation (DFT) to reduce the dimensionality of the streams. Each stream is represented by a point in a multidimensional grid in the frequency domain. The algorithm uses an improved unsupervised grid-based clustering technique to detect similar streams and to form clusters. The paper also proposes an efficient algorithm for detecting correlation among phenomena. The proposed algorithm calculates the correlation coefficient in the frequency domain. It makes use of the DFT coefficients that are calculated for detecting the phenomena. The proposed correlation detection algorithm uses only few DFT coefficients in the frequency domain.Experiments on synthetic data streams show that the proposed algorithm for detecting and tracking phenomena is much faster than the DBSCAN clustering technique, which is based on the R-tree index. At the same time, the proposed phenomena detection algorithm produces the same quality as that of the DBSCAN by only using two DFT coefficients in most of the cases. The experimental results also showed that the proposed technique for detecting the correlation among phenomena performs as good as the traditional Pearson correlation formula but it is much faster.