Effective variation management for pseudo periodical streams

  • Authors:
  • Lv-an Tang;Bin Cui;Hongyan Li;Gaoshan Miao;Dongqing Yang;Xinbiao Zhou

  • Affiliations:
  • Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China;Peking University, Beijing, China

  • Venue:
  • Proceedings of the 2007 ACM SIGMOD international conference on Management of data
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many database applications require the analysis and processing of data streams. In such systems, huge amounts of data arrive rapidly and their values change over time. The variations on streams typically imply some fundamental changes of the underlying objects and possess significant domain meanings. In some data streams, successive events seem to recur in a certain time interval, but the data indeed evolves with tiny differences as time elapses. This feature is called pseudo periodicity, which poses a non-trivial challenge to variation management in data streams. This paper presents our research effort in online variation management over such streams, and the idea can be applied to the problem domain of medical applications, such as patient vital signal monitoring. We propose a new method named Pattern Growth Graph (PGG) to detect and manage variations over pseudo periodical streams. PGG adopts the wave-pattern to capture the major information of data evolution and represent them compactly. With the help of wave-pattern matching algorithm, PGG detects the stream variations in a single pass over the stream data. PGG only stores the different segments of the pattern for incoming stream, and hence it can substantially compress the data without losing important information. The statistical information of PGG helps to distinguish meaningful data changes from noise and to reconstruct the stream with acceptable accuracy. Extensive experiments on real datasets containing millions of data items demonstrate the feasibility and effectiveness of the proposed scheme.