A Bayesian method for guessing the extreme values in a data set?

Authors:
Mingxi Wu;Christopher Jermaine
Affiliations:
University of Florida, Gainesville, FL;University of Florida, Gainesville, FL
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 13
Cited 5

Statistical estimators for aggregate relational algebra queries

ACM Transactions on Database Systems (TODS)
Efficient processing of spatial joins using R-trees

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Spatial hash-joins

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Incremental distance join algorithms for spatial databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Adaptive multi-stage distance join processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Scalable Sweeping-Based Spatial Join

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Probabilistic Optimization of Top N Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Monte Carlo Statistical Methods (Springer Texts in Statistics)

Monte Carlo Statistical Methods (Springer Texts in Statistics)

Turbo-charging estimate convergence in DBO

Proceedings of the VLDB Endowment
Supporting ranking queries on uncertain and incomplete data

The VLDB Journal — The International Journal on Very Large Data Bases
Distance-based outlier detection: consolidation and renewed bearing

Proceedings of the VLDB Endowment
Effective and efficient sampling methods for deep web aggregation queries

Proceedings of the 14th International Conference on Extending Database Technology
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

For a large number of data management problems, it would be very useful to be able to obtain a few samples from a data set, and to use the samples to guess the largest (or smallest) value in the entire data set. Min/max online aggregation, top-k query processing, outlier detection, and distance join are just a few possible applications. This paper details a statistically rigorous, Bayesian approach to attacking this problem. Just as importantly, we demonstrate the utility of our approach by showing how it can be applied to two specific problems that arise in the context of data management.