Statistical estimators for relational algebra expressions

Authors:
Wen-Chi Hou;Gultekin Ozsoyoglu;Baldeo K. Taneja
Affiliations:
Department of Computer Engineering and Science and Center for Automation and Intelligent Systems, Case Western Reserve University, Cleveland, Ohio;Department of Computer Engineering and Science and Center for Automation and Intelligent Systems, Case Western Reserve University, Cleveland, Ohio;Department of Computer Engineering and Science and Center for Automation and Intelligent Systems, Case Western Reserve University, Cleveland, Ohio and Department of Mathematics and Statistics, Cas ...
Venue:
Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Year:
1988

Citing 4
Cited 65

Antisampling for Estimation: An Overview

IEEE Transactions on Software Engineering
Simple Random Sampling from Relational Databases

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Computer based management information systems embodying answer accuracy as a user parameter

Computer based management information systems embodying answer accuracy as a user parameter
Physical database support for scientific and statistical database management

SSDBM'86 Proceedings of the 3rd international workshop on Statistical and scientific database management

Processing aggregate relational queries with hard time constraints

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Estimating the size of generalized transitive closures

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Random sampling from B+ trees

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Random sampling from hash files

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Statistical estimators for aggregate relational algebra queries

ACM Transactions on Database Systems (TODS)
Error-constrained COUNT query evaluation in relational databases

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Sequential sampling procedures for query size estimation

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Processing time-constrained aggregate queries in CASE-DB

ACM Transactions on Database Systems (TODS)
Multiple join size estimation by virtual domains (extended abstract)

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Fixed-precision estimation of join selectivity

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Predicate migration: optimizing queries with expensive predicates

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Using statistical sampling for query optimization in heterogeneous library information systems

CSC '93 Proceedings of the 1993 ACM conference on Computer science
On the relative cost of sampling for join selectivity estimation

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Adaptive selectivity estimation using query feedback

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Bifocal sampling for skew-resistant join size estimation

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Query execution techniques for caching expensive methods

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Distributed processing of time-constrained queries in CASE-DB

CIKM '96 Proceedings of the fifth international conference on Information and knowledge management
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization techniques for queries with expensive methods

ACM Transactions on Database Systems (TODS)
Query size estimation by adaptive sampling (extended abstract)

PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Tracking join and self-join sizes in limited storage

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Uniform generation in spatial constraint databases and applications (Extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Towards estimation error guarantees for distinct values

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Effective Query Size Estimation Using Neural Networks

Applied Intelligence
Informix under CONTROL: Online Query Processing

Data Mining and Knowledge Discovery
Approximate Query Answering Using Data Warehouse Striping

Journal of Intelligent Information Systems - Special issue on data warehousing and knowledge discovery
Time-Constrained Query Processing in CASE-DB

IEEE Transactions on Knowledge and Data Engineering
Online Feedback for Nested Aggregate Queries with Multi-Threading

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
An Evaluation of Non-Equijoin Algorithms

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes

Proceedings of the 27th International Conference on Very Large Data Bases
Random Sampling from Pseudo-Ranked B+ Trees

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Approximate Query Answering In Numerical Databases

SSDBM '96 Proceedings of the Eighth International Conference on Scientific and Statistical Database Management
Progressive evaluation of nested aggregate queries

The VLDB Journal — The International Journal on Very Large Data Bases
Containment join size estimation: models and methods

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An integrated method for estimating selectivities in a multidatabase system

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Query Size Estimation for Joins Using Systematic Sampling

Distributed and Parallel Databases
A bi-level Bernoulli scheme for database sampling

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Effective use of block-level sampling in statistics estimation

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A disk-based join with probabilistic guarantees

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Towards estimating the number of distinct value combinations for a set of attributes

Proceedings of the 14th ACM international conference on Information and knowledge management
Random sampling from database files: a survey

SSDBM'1990 Proceedings of the 5th international conference on Statistical and Scientific Database Management
Database systems for programmable logic controllers

SSDBM'1990 Proceedings of the 5th international conference on Statistical and Scientific Database Management
Precision-time tradeoffs: a paradigm for processing statistical queries on databases

SSDBM'1988 Proceedings of the 4th international conference on Statistical and Scientific Database Management
Uniform generation in spatial constraint databases and applications

Journal of Computer and System Sciences
The Sort-Merge-Shrink join

ACM Transactions on Database Systems (TODS)
Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more

Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more
Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Confidence bounds for sampling-based group by estimates

ACM Transactions on Database Systems (TODS)
Distinct value estimation on peer-to-peer networks

Proceedings of the 1st international conference on PErvasive Technologies Related to Assistive Environments
New join operator definitions for sensor network databases

AEE'07 Proceedings of the 6th conference on Applications of electrical engineering
A sampling approach for XML query selectivity estimation

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Progressive Evaluation of XML Queries for Online Aggregation and Progress Indicator

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Multimedia selection operation placement

Multimedia Tools and Applications
The VC-dimension of SQL queries and selectivity estimation through sampling

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Effective stratification for low selectivity queries on deep web data sources

Proceedings of the 20th ACM international conference on Information and knowledge management
Practical algorithms for tracking database join sizes

FSTTCS '05 Proceedings of the 25th international conference on Foundations of Software Technology and Theoretical Computer Science
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
CS2: a new database synopsis for query estimation

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Present database systems process all the data related to a query before giving out responses. As a result, the size of the data to be processed becomes excessive for real-time/time-constrained environments. A new methodology is needed to cut down systematically the time to process the data involved in processing the query. To this end, we propose to use data samples and construct an approximate synthetic response to a given query.In this paper, we consider only COUNT(E) type queries, where E is an arbitrary relational algebra expression. We make no assumptions about the distribution of attribute values and ordering of tuples in the input relations, and propose consistent and unbiased estimators for arbitrary COUNT(E) type queries. We design a sampling plan based on the cluster sampling method to improve the utilization of sampled data and to reduce the cost of sampling. We also evaluate the performance of the proposed estimators.