Scale out parallel and distributed CDR stream analytics

  • Authors:
  • Qiming Chen;Meichun Hsu

  • Affiliations:
  • HP Labs, Palo Alto, California and Hewlett Packard Co.;HP Labs, Palo Alto, California and Hewlett Packard Co.

  • Venue:
  • Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the era of information explosion, huge amount of data are generated from various sensing devices continuously, which are often too low level for analytics purpose, and too massive to load to data-warehouses for filtering and summarizing with the reasonable latency. Distributed stream analytics for multilevel abstraction is the key to solve this problem. We advocate a distributed infrastructure for CDR (Call Detail Record) stream analytics in the telecommunication network where the stream processing is integrated into the database engine, and carried out in terms of continuous querying; the computation model is based on network-distributed (rather than clustered) Map-Reduce scheme. We propose the window based cooperation mechanism for having multiple engines synchronized and cooperating on the data falling in a common window boundary, based on time, cardinality, etc. This mechanism allows the engines to cooperate window by window without centralized coordination. We further propose the quantization mechanism for integrating the discretization and abstraction of continuous-valued data, for efficient and incremental data reduction, and in turn, network data movement reduction. These mechanisms provide the key roles in scaling out CDR stream analytics. The proposed approach has been integrated into the PostgreSQL engine. Our preliminary experiments reveal its merit for large-scale distributed stream processing.