Statistical profile estimation in database systems
ACM Computing Surveys (CSUR)
A detailed statistical model for relational query optimization
ACM '85 Proceedings of the 1985 ACM annual conference on The range of computing : mid-80's perspective: mid-80's perspective
Rate-based query optimization for streaming information sources
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
M-Kernel Merging: Towards Density Estimation over Data Streams
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Aurora: a new model and architecture for data stream management
The VLDB Journal — The International Journal on Very Large Data Bases
Selectivity estimators for multidimensional range queries over real attributes
The VLDB Journal — The International Journal on Very Large Data Bases
The CQL continuous query language: semantic foundations and query execution
The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive Wavelet Density Estimators over Data Streams
SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Distribution models of relations
VLDB '79 Proceedings of the fifth international conference on Very Large Data Bases - Volume 5
Toward Simulation-Based Optimization in Data Stream Management Systems
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Integration of Heterogeneous Sensor Nodes by Data Stream Management
MDM '09 Proceedings of the 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware
Black-box determination of cost models' parameters for federated stream-processing systems
Proceedings of the 15th Symposium on International Database Engineering & Applications
Hi-index | 0.00 |
Data Stream Systems (DSSs) use cost models to determine if a DSS can cope with a given workload and to optimize query graphs. However, certain relevant input parameters of these models are often unknown or highly imprecise. Especially selectivities are stream-dependent and application-specific parameters. In this paper, we describe a method that supports selectivity estimation considering input streams' attribute value distribution. The novelty of our approach is the propagation of the probability distributions through the query graph in order to give estimates for the inner nodes of the graph. For most common stream operators, we establish formulas that describe their output distribution as a function of their input distributions. For unknown operators like User-Defined Operators (UDOs), we introduce a method to measure the influence of these operators on arbitrary probability distributions. This method is able to do most of the computational work before the query is deployed and introduces minimal overhead at runtime. Our evaluation framework facilitates the appropriate combination of both methods and allows to model almost arbitrary query graphs.