Absolute Bounds on Set Intersection and Union Sizes from Distribution Information
IEEE Transactions on Software Engineering
Statistical profile estimation in database systems
ACM Computing Surveys (CSUR)
Processing aggregate relational queries with hard time constraints
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
Aggregate evaluability in statistical databases
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Statistical estimators for aggregate relational algebra queries
ACM Transactions on Database Systems (TODS)
Processing time-constrained aggregate queries in CASE-DB
ACM Transactions on Database Systems (TODS)
Computation of partial query results with an adaptive stratified sampling technique
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Statistical estimators for relational algebra expressions
Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Managing and analyzing massive data sets with data cubes
Handbook of massive data sets
pCube: Update-Efficient Online Aggregation with Progressive Feedback and Error Bounds
SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
Database systems for programmable logic controllers
SSDBM'1990 Proceedings of the 5th international conference on Statistical and Scientific Database Management
Hi-index | 0.00 |
We survey a new way to get quick estimates of the values of simple statistks (like count, mean, standard deviation, maximum, median, and mode frequency) on a large data set. This approach is a comprehensive attempt (apparently the first) to estimate statistics without any sampling. Our "antisampling" techniques have analogies to those of sampling, and exhibit similar estimation accuracy, but can be done much faster than sampling with large computer databases. Antisampling exploits computer science ideas from database theory and expert systems, building an auxiliary structure called a "database abstract." We make detailed comparisons to several different kinds of sampling.