Antisampling for Estimation: An Overview

Authors:
Neil C. Rowe
Affiliations:
Naval Postgraduate School, Monterey, CA
Venue:
IEEE Transactions on Software Engineering
Year:
1985

Citing 0
Cited 13

Absolute Bounds on Set Intersection and Union Sizes from Distribution Information

IEEE Transactions on Software Engineering
Statistical profile estimation in database systems

ACM Computing Surveys (CSUR)
Processing aggregate relational queries with hard time constraints

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
A linear-time probabilistic counting algorithm for database applications

ACM Transactions on Database Systems (TODS)
Aggregate evaluability in statistical databases

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Statistical estimators for aggregate relational algebra queries

ACM Transactions on Database Systems (TODS)
Processing time-constrained aggregate queries in CASE-DB

ACM Transactions on Database Systems (TODS)
Computation of partial query results with an adaptive stratified sampling technique

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Statistical estimators for relational algebra expressions

Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Managing and analyzing massive data sets with data cubes

Handbook of massive data sets
pCube: Update-Efficient Online Aggregation with Progressive Feedback and Error Bounds

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
Database systems for programmable logic controllers

SSDBM'1990 Proceedings of the 5th international conference on Statistical and Scientific Database Management
Data requirements in statistical decision support systems: formulation and some results in choosing summaries

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We survey a new way to get quick estimates of the values of simple statistks (like count, mean, standard deviation, maximum, median, and mode frequency) on a large data set. This approach is a comprehensive attempt (apparently the first) to estimate statistics without any sampling. Our "antisampling" techniques have analogies to those of sampling, and exhibit similar estimation accuracy, but can be done much faster than sampling with large computer databases. Antisampling exploits computer science ideas from database theory and expert systems, building an auxiliary structure called a "database abstract." We make detailed comparisons to several different kinds of sampling.