Parsimonious temporal aggregation

  • Authors:
  • Juozas Gordevičius;Johann Gamper;Michael Böhlen

  • Affiliations:
  • Institute of Mathematics and Informatics, Vilnius University, Vilnius, Lithuania;Free University of Bozen-Bolzano, Bolzano, Italy;Department of Informatics, University of Zurich, Zurich, Switzerland

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Temporal aggregation is an important operation in temporal databases, and different variants thereof have been proposed. In this paper, we introduce a novel temporal aggregation operator, termed parsimonious temporal aggregation (PTA), that overcomes major limitations of existing approaches. PTA takes the result of instant temporal aggregation (ITA) of size n, which might be up to twice as large as the argument relation, and merges similar tuples until a given error ( $${\epsilon}$$ ) or size (c) bound is reached. The new operator is data-adaptive and allows the user to control the trade-off between the result size and the error introduced by merging. For the precise evaluation of PTA queries, we propose two dynamic programming---based algorithms for size- and error-bounded queries, respectively, with a worst-case complexity that is quadratic in n. We present two optimizations that take advantage of temporal gaps and different aggregation groups and achieve a linear runtime in experiments with real-world data. For the quick computation of an approximate PTA answer, we propose an efficient greedy merging strategy with a precision that is upper bounded by O(log n). We present two algorithms that implement this strategy and begin to merge as ITA tuples are produced. They require O(n log (c + β)) time and O(c + β) space, where β is the size of a read-ahead buffer and is typically very small. An empirical evaluation on real-world and synthetic data shows that PTA considerably reduces the size of the aggregation result, yet introducing only small errors. The greedy algorithms are scalable for large data sets and introduce less error than other approximation techniques.