Practical selectivity estimation through adaptive sampling
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of high performance database systems
Communications of the ACM
On the relative cost of sampling for join selectivity estimation
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Adaptive selectivity estimation using query feedback
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Query size estimation by adaptive sampling
Selected papers of the 9th annual ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Bifocal sampling for skew-resistant join size estimation
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Histogram-based estimation techniques in database systems
Histogram-based estimation techniques in database systems
Reusing invariants: a new strategy for correlated queries
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Statistical estimators for relational algebra expressions
Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom
Query Processing in Parallel Relational Database Systems
Query Processing in Parallel Relational Database Systems
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Maintenance of Materialized Views of Sampling Queries
Proceedings of the Eighth International Conference on Data Engineering
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Experimental Evaluation of a New Distributed Partitioning Technique for Data Warehouses
IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
Large-Sample and Deterministic Confidence Intervals for Online Aggregation
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Techniques for Online Exploration of Large Object-Relational Datasets
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
A Case for Parallelism in Data Warehousing and OLAP
DEXA '98 Proceedings of the 9th International Workshop on Database and Expert Systems Applications
The optimization of queries in relational databases
The optimization of queries in relational databases
Handling big dimensions in distributed data warehouses using the DWS technique
Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
Data warehouse access using multi-agent system
Distributed and Parallel Databases
A Query Cache Tool for Optimizing Repeatable and Parallel OLAP Queries
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Probabilistic model for accuracy estimation in approximate monodimensional analyses
WSEAS Transactions on Computers
Accuracy estimation in approximate query processing
ICCOMP'10 Proceedings of the 14th WSEAS international conference on Computers: part of the 14th WSEAS CSCC multiconference - Volume II
Efficient compression of text attributes of data warehouse dimensions
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Metrics for approximate query engine evaluation
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Particle swarm optimisation for data warehouse logical design
International Journal of Bio-Inspired Computation
Exploiting data access for dynamic fragmentation in data warehouse
International Journal of Intelligent Information and Database Systems
Hi-index | 0.00 |
This paper presents and evaluates a simple but very effective method to implement large data warehouses on an arbitrary number of computers, achieving very high query execution performance and scalability. The data is distributed and processed in a potentially large number of autonomous computers using our technique called data warehouse striping (DWS). The major problem of DWS technique is that it would require a very expensive cluster of computers with fault tolerant capabilities to prevent a fault in a single computer to stop the whole system. In this paper, we propose a radically different approach to deal with the problem of the unavailability of one or more computers in the cluster, allowing the use of DWS with a very large number of inexpensive computers. The proposed approach is based on approximate query answering techniques that make it possible to deliver an approximate answer to the user even when one or more computers in the cluster are not available. The evaluation presented in the paper shows both analytically and experimentally that the approximate results obtained this way have a very small error that can be negligible in most of the cases.