Propagation of densities of streaming data within query graphs

Authors:
Michael Daum;Frank Lauterwald;Philipp Baumgärtel;Klaus Meyer-Wegener
Affiliations:
Dept. of Computer Science, University of Erlangen-Nuremberg, Germany;Dept. of Computer Science, University of Erlangen-Nuremberg, Germany;Dept. of Computer Science, University of Erlangen-Nuremberg, Germany;Dept. of Computer Science, University of Erlangen-Nuremberg, Germany
Venue:
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Year:
2010

Citing 11
Cited 1

Statistical profile estimation in database systems

ACM Computing Surveys (CSUR)
A detailed statistical model for relational query optimization

ACM '85 Proceedings of the 1985 ACM annual conference on The range of computing : mid-80's perspective: mid-80's perspective
Rate-based query optimization for streaming information sources

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
M-Kernel Merging: Towards Density Estimation over Data Streams

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
Selectivity estimators for multidimensional range queries over real attributes

The VLDB Journal — The International Journal on Very Large Data Bases
The CQL continuous query language: semantic foundations and query execution

The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive Wavelet Density Estimators over Data Streams

SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Distribution models of relations

VLDB '79 Proceedings of the fifth international conference on Very Large Data Bases - Volume 5
Toward Simulation-Based Optimization in Data Stream Management Systems

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Integration of Heterogeneous Sensor Nodes by Data Stream Management

MDM '09 Proceedings of the 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware

Black-box determination of cost models' parameters for federated stream-processing systems

Proceedings of the 15th Symposium on International Database Engineering & Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data Stream Systems (DSSs) use cost models to determine if a DSS can cope with a given workload and to optimize query graphs. However, certain relevant input parameters of these models are often unknown or highly imprecise. Especially selectivities are stream-dependent and application-specific parameters. In this paper, we describe a method that supports selectivity estimation considering input streams' attribute value distribution. The novelty of our approach is the propagation of the probability distributions through the query graph in order to give estimates for the inner nodes of the graph. For most common stream operators, we establish formulas that describe their output distribution as a function of their input distributions. For unknown operators like User-Defined Operators (UDOs), we introduce a method to measure the influence of these operators on arbitrary probability distributions. This method is able to do most of the computational work before the query is deployed and introduces minimal overhead at runtime. Our evaluation framework facilitates the appropriate combination of both methods and allows to model almost arbitrary query graphs.