A skip-list approach for efficiently processing forecasting queries

Authors:
Tingjian Ge;Stan Zdonik
Affiliations:
Brown University;Brown University
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 18
Cited 8

Skip lists: a probabilistic alternative to balanced trees

Communications of the ACM
Optimization and evaluation of database queries including embedded interpolation procedures

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Deterministic skip lists

SODA '92 Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Lots o'Ticks: real time high performance time series queries on billions of trades and quotes

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Skip graphs

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Algebraic Optimization of Computations over Scientific Databases

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation

CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Online Amnesic Approximation of Streaming Time Series

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Load balancing and locality in range-queriable data structures

Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Probability and Computing: Randomized Algorithms and Probabilistic Analysis

Probability and Computing: Randomized Algorithms and Probabilistic Analysis
MauveDB: supporting model-based user views in database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Optimal multi-scale patterns in time series streams

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Local Correlation Tracking in Time Series

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Statistics for Engineering and the Sciences (5th Edition)

Statistics for Engineering and the Sciences (5th Edition)
Processing forecasting queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Skip b-trees

OPODIS'05 Proceedings of the 9th international conference on Principles of Distributed Systems
PAQ: time series forecasting for approximate query answering in sensor networks

EWSN'06 Proceedings of the Third European conference on Wireless Sensor Networks

Indexing forecast models for matching and maintenance

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Efficient in-database maintenance of ARIMA models

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Embedding forecast operators in databases

SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
Sample-based forecasting exploiting hierarchical time series

Proceedings of the 16th International Database Engineering & Applications Sysmposium
Partitioning and multi-core parallelization of multi-equation forecast models

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Optimizing notifications of subscription-based forecast queries

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Efficient integration of external information into forecast models from the energy domain

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Optimized renewable energy forecasting in local distribution networks

Proceedings of the Joint EDBT/ICDT 2013 Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

Time series data is common in many settings including scientific and financial applications. In these applications, the amount of data is often very large. We seek to support prediction queries over time series data. Prediction relies on model building which can be too expensive to be practical if it is based on a large number of data points. We propose to use statistical tests of hypotheses to choose a proper subset of data points to use for a given prediction query interval. This involves two steps: choosing a proper history length and choosing the number of data points to use within this history. Further, we use an I/O conscious skip list data structure to provide samples of the original data set. Based on the statistics collected for a query workload, which we model as a probability mass function (PMF) over query intervals, we devise a randomized algorithm that selects a set of pre-built models (PM's) to construct, subject to some maintenance cost constraint when there are updates. Given this set of PM's, we discuss interesting query processing strategies for not only point queries, but also range, aggregation, and JOIN queries. We conduct a comprehensive empirical study on real world datasets to verify the effectiveness of our approaches and algorithms.