Skip lists: a probabilistic alternative to balanced trees
Communications of the ACM
Optimization and evaluation of database queries including embedded interpolation procedures
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
SODA '92 Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms
Fast subsequence matching in time-series databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Lots o'Ticks: real time high performance time series queries on billions of trades and quotes
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Algebraic Optimization of Computations over Scientific Databases
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation
CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Online Amnesic Approximation of Streaming Time Series
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Load balancing and locality in range-queriable data structures
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
Probability and Computing: Randomized Algorithms and Probabilistic Analysis
Probability and Computing: Randomized Algorithms and Probabilistic Analysis
MauveDB: supporting model-based user views in database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Optimal multi-scale patterns in time series streams
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Local Correlation Tracking in Time Series
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Statistics for Engineering and the Sciences (5th Edition)
Statistics for Engineering and the Sciences (5th Edition)
Processing forecasting queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
OPODIS'05 Proceedings of the 9th international conference on Principles of Distributed Systems
PAQ: time series forecasting for approximate query answering in sensor networks
EWSN'06 Proceedings of the Third European conference on Wireless Sensor Networks
Indexing forecast models for matching and maintenance
Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Efficient in-database maintenance of ARIMA models
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Embedding forecast operators in databases
SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
Sample-based forecasting exploiting hierarchical time series
Proceedings of the 16th International Database Engineering & Applications Sysmposium
Partitioning and multi-core parallelization of multi-equation forecast models
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Optimizing notifications of subscription-based forecast queries
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Efficient integration of external information into forecast models from the energy domain
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Optimized renewable energy forecasting in local distribution networks
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Hi-index | 0.00 |
Time series data is common in many settings including scientific and financial applications. In these applications, the amount of data is often very large. We seek to support prediction queries over time series data. Prediction relies on model building which can be too expensive to be practical if it is based on a large number of data points. We propose to use statistical tests of hypotheses to choose a proper subset of data points to use for a given prediction query interval. This involves two steps: choosing a proper history length and choosing the number of data points to use within this history. Further, we use an I/O conscious skip list data structure to provide samples of the original data set. Based on the statistics collected for a query workload, which we model as a probability mass function (PMF) over query intervals, we devise a randomized algorithm that selects a set of pre-built models (PM's) to construct, subject to some maintenance cost constraint when there are updates. Given this set of PM's, we discuss interesting query processing strategies for not only point queries, but also range, aggregation, and JOIN queries. We conduct a comprehensive empirical study on real world datasets to verify the effectiveness of our approaches and algorithms.