A learning-based approach to estimate statistics of operators in continuous queries: a case study

Authors:
Like Gao;Min Wang;X. Sean Wang;Sriram Padmanabhan
Affiliations:
George Mason University, VA;IBM T. J. Watson Research Center, NY;George Mason University, VA;IBM T. J. Watson Research Center, NY
Venue:
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Year:
2003

Citing 26
Cited 1

Discrete-time signal processing

Discrete-time signal processing
Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Continuous queries over append-only databases

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
C4.5: programs for machine learning

C4.5: programs for machine learning
An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Adaptive selectivity estimation using query feedback

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Similarity-based queries for time series data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Time series similarity measures (tutorial PM-2)

Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Rate-based query optimization for streaming information sources

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Continually evaluating similarity-based pattern queries on a streaming time series

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Querying and mining data streams: you only get one look a tutorial

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Evaluating continuous nearest neighbor queries for streaming time series via pre-fetching

Proceedings of the eleventh international conference on Information and knowledge management
Continuous queries over data streams

ACM SIGMOD Record
Continual Queries for Internet Scale Event-Driven Information Delivery

IEEE Transactions on Knowledge and Data Engineering
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Query Size Estimation Using Machine Learning

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Differential evaluation of continual queries

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Continuous Queries

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
The optimization of queries in relational databases

The optimization of queries in relational databases

Quality-driven evaluation of trigger conditions on streaming time series

Proceedings of the 2005 ACM symposium on Applied computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistic estimation such as output size estimation of operators is a well-studied subject in the database research community, mainly for the purpose of query optimization. The assumption, however, is that queries are ad-hoc and therefore the emphasis has been on capturing the data distribution. When long standing continuous queries on a changing database are concerned, a more direct approach, namely building an estimation model for each operator, is possible. In this paper, we propose a novel learning-based method. Our method consists of two steps. The first step is to design a dedicated feature extraction algorithm that can be used incrementally to obtain feature values from the underlying data. The second step is to use a data mining algorithm to generate an estimation model based on the feature values extracted from the historical data. To illustrate the approach, this paper studies the case of similarity-based searches over streaming time series. Experimental results show this approach provides accurate statistic estimates with a low overhead.