Designing Random Sample Synopses with Outliers

Authors:
Philipp Rosch;Rainer Gemulla;Wolfgang Lehner
Affiliations:
Technische Universität Dresden / Faculty of Computer Science / Database Technology Group, 01062 Dresden, Germany. philipp.roesch@tu-dresden.de;Technische Universität Dresden / Faculty of Computer Science / Database Technology Group, 01062 Dresden, Germany. rainer.gemulla@tu-dresden.de;Technische Universität Dresden / Faculty of Computer Science / Database Technology Group, 01062 Dresden, Germany. wolfgang.lehner@tu-dresden.de
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 4

Linked Bernoulli Synopses: Sampling along Foreign Keys

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Sample synopses for approximate answering of group-by queries

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A sample advisor for approximate query processing

ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
Optimizing Sample Design for Approximate Query Processing

International Journal of Knowledge-Based Organizations

Quantified Score

Hi-index	0.00

Visualization

Abstract

Random sampling is one of the most widely used means to build synopses of large datasets because random samples can be used for a wide range of analytical tasks. Unfortunately, the quality of the estimates derived from a sample is negatively affected by the presence of "outliers" in the data. In this paper, we show how to circumvent this shortcoming by constructing outlier-aware sample synopses. Our approach extends the well-known outlier indexing scheme to multiple aggregation columns.