Creating probabilistic databases from imprecise time-series data

  • Authors:
  • Saket Sathe;Hoyoung Jeung;Karl Aberer

  • Affiliations:
  • Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland;Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland;Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland

  • Venue:
  • ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although efficient processing of probabilistic databases is a well-established field, a wide range of applications are still unable to benefit from these techniques due to the lack of means for creating probabilistic databases. In fact, it is a challenging problem to associate concrete probability values with given time-series data for forming a probabilistic database, since the probability distributions used for deriving such probability values vary over time. In this paper, we propose a novel approach to create tuple-level probabilistic databases from (imprecise) time-series data. To the best of our knowledge, this is the first work that introduces a generic solution for creating probabilistic databases from arbitrary time series, which can work in online as well as offline fashion. Our approach consists of two key components. First, the dynamic density metrics that infer time-dependent probability distributions for time series, based on various mathematical models. Our main metric, called the GARCH metric, can robustly capture such evolving probability distributions regardless of the presence of erroneous values in a given time series. Second, the Ω-View builder that creates probabilistic databases from the probability distributions inferred by the dynamic density metrics. For efficient processing, we introduce the σ-cache that reuses the information derived from probability values generated at previous times. Extensive experiments over real datasets demonstrate the effectiveness of our approach.