Analysis of Accuracy of Data Reduction Techniques

Authors:
Pedro Furtado;Henrique Madeira
Affiliations:
-;-
Venue:
DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Year:
1999

Citing 7
Cited 1

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Recovering Information from Summary Data

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases

Adding a Performance-Oriented Perspective to Data Warehouse Design

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is a growing interest in the analysis of data in warehouses. Data warehouses can be extremely large and typical queries frequently take too long to answer. Manageable and portable summaries return interactive response times in exploratory data analysis. Obtaining the best estimates for smaller response times and storage needs is the objective of simple data reduction techniques that usually produce coarse approximations. But because the user is exposed to the approximation returned, it is important to determine which queries would not be approximated satisfactorily, in which case either the base data is accessed (if available) or the user is warned. In this paper the accuracy of approximations is determined experimentally for simple data reduction algorithms and several data sets. We show that data cube density and distribution skew are important parameters and large range queries are approximated much more accurately then point or small range queries. We quantify this and other results that should be taken into consideration when incorporating the data reduction techniques into the design.