Creating probabilistic databases from imprecise time-series data

Authors:
Saket Sathe;Hoyoung Jeung;Karl Aberer
Affiliations:
Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland;Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland;Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland
Venue:
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Year:
2011

Citing 0
Cited 2

DAGger: clustering correlated uncertain data (to predict asset failure in energy networks)

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Reasoning about RFID-tracked moving objects in symbolic indoor spaces

Proceedings of the 25th International Conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although efficient processing of probabilistic databases is a well-established field, a wide range of applications are still unable to benefit from these techniques due to the lack of means for creating probabilistic databases. In fact, it is a challenging problem to associate concrete probability values with given time-series data for forming a probabilistic database, since the probability distributions used for deriving such probability values vary over time. In this paper, we propose a novel approach to create tuple-level probabilistic databases from (imprecise) time-series data. To the best of our knowledge, this is the first work that introduces a generic solution for creating probabilistic databases from arbitrary time series, which can work in online as well as offline fashion. Our approach consists of two key components. First, the dynamic density metrics that infer time-dependent probability distributions for time series, based on various mathematical models. Our main metric, called the GARCH metric, can robustly capture such evolving probability distributions regardless of the presence of erroneous values in a given time series. Second, the Ω-View builder that creates probabilistic databases from the probability distributions inferred by the dynamic density metrics. For efficient processing, we introduce the σ-cache that reuses the information derived from probability values generated at previous times. Extensive experiments over real datasets demonstrate the effectiveness of our approach.