Designing Random Sample Synopses with Outliers

  • Authors:
  • Philipp Rosch;Rainer Gemulla;Wolfgang Lehner

  • Affiliations:
  • Technische Universität Dresden / Faculty of Computer Science / Database Technology Group, 01062 Dresden, Germany. philipp.roesch@tu-dresden.de;Technische Universität Dresden / Faculty of Computer Science / Database Technology Group, 01062 Dresden, Germany. rainer.gemulla@tu-dresden.de;Technische Universität Dresden / Faculty of Computer Science / Database Technology Group, 01062 Dresden, Germany. wolfgang.lehner@tu-dresden.de

  • Venue:
  • ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Random sampling is one of the most widely used means to build synopses of large datasets because random samples can be used for a wide range of analytical tasks. Unfortunately, the quality of the estimates derived from a sample is negatively affected by the presence of "outliers" in the data. In this paper, we show how to circumvent this shortcoming by constructing outlier-aware sample synopses. Our approach extends the well-known outlier indexing scheme to multiple aggregation columns.