Approximate Query Answering Using Data Warehouse Striping

Authors:
Jorge R. Bernardino;Pedro S. Furtado;Henrique C. Madeira
Affiliations:
Polytechnic of Coimbra, ISEC, DEIS, Apt. 10057, P-3030-601 Coimbra, Portugal. jorge@isec.pt;University of Coimbra, DEI, Pólo II, P-3030-290 Coimbra, Portugal. pnf@dei.uc.pt;University of Coimbra, DEI, Pólo II, P-3030-290 Coimbra, Portugal. henrique@dei.uc.pt
Venue:
Journal of Intelligent Information Systems - Special issue on data warehousing and knowledge discovery
Year:
2002

Citing 29
Cited 9

Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of high performance database systems

Communications of the ACM
On the relative cost of sampling for join selectivity estimation

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Adaptive selectivity estimation using query feedback

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Query size estimation by adaptive sampling

Selected papers of the 9th annual ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Bifocal sampling for skew-resistant join size estimation

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Histogram-based estimation techniques in database systems

Histogram-based estimation techniques in database systems
Reusing invariants: a new strategy for correlated queries

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Statistical estimators for relational algebra expressions

Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom

The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom
Query Processing in Parallel Relational Database Systems

Query Processing in Parallel Relational Database Systems
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
Complex Query Decorrelation

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Maintenance of Materialized Views of Sampling Queries

Proceedings of the Eighth International Conference on Data Engineering
The Design of XPRS

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Experimental Evaluation of a New Distributed Partitioning Technique for Data Warehouses

IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
Large-Sample and Deterministic Confidence Intervals for Online Aggregation

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Techniques for Online Exploration of Large Object-Relational Datasets

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
A Case for Parallelism in Data Warehousing and OLAP

DEXA '98 Proceedings of the 9th International Workshop on Database and Expert Systems Applications
The optimization of queries in relational databases

The optimization of queries in relational databases

Handling big dimensions in distributed data warehouses using the DWS technique

Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
Data warehouse access using multi-agent system

Distributed and Parallel Databases
A Query Cache Tool for Optimizing Repeatable and Parallel OLAP Queries

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Probabilistic model for accuracy estimation in approximate monodimensional analyses

WSEAS Transactions on Computers
Accuracy estimation in approximate query processing

ICCOMP'10 Proceedings of the 14th WSEAS international conference on Computers: part of the 14th WSEAS CSCC multiconference - Volume II
Efficient compression of text attributes of data warehouse dimensions

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Metrics for approximate query engine evaluation

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Particle swarm optimisation for data warehouse logical design

International Journal of Bio-Inspired Computation
Exploiting data access for dynamic fragmentation in data warehouse

International Journal of Intelligent Information and Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents and evaluates a simple but very effective method to implement large data warehouses on an arbitrary number of computers, achieving very high query execution performance and scalability. The data is distributed and processed in a potentially large number of autonomous computers using our technique called data warehouse striping (DWS). The major problem of DWS technique is that it would require a very expensive cluster of computers with fault tolerant capabilities to prevent a fault in a single computer to stop the whole system. In this paper, we propose a radically different approach to deal with the problem of the unavailability of one or more computers in the cluster, allowing the use of DWS with a very large number of inexpensive computers. The proposed approach is based on approximate query answering techniques that make it possible to deliver an approximate answer to the user even when one or more computers in the cluster are not available. The evaluation presented in the paper shows both analytically and experimentally that the approximate results obtained this way have a very small error that can be negligible in most of the cases.