A performance study of three high availability data replication strategies

Authors:
Hui-I Hsiao;David J. DeWitt
Affiliations:
-;-
Venue:
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Year:
1991

Citing 0
Cited 11

Database research at Wisconsin

ACM SIGMOD Record
Using shared virtual memory for parallel join processing

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
On parallel execution of multiple pipelined hash joins

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Dynamic Data Reallocation for Skew Management inShared-Nothing Parallel Databases

Distributed and Parallel Databases
Parallel Execution of Hash Joins in Parallel Databases

IEEE Transactions on Parallel and Distributed Systems
Analytic Modeling and Comparisons of Striping Strategies for Replicated Disk Arrays

IEEE Transactions on Computers
Applying Hash Filters to Improving the Execution of Bushy Trees

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
On applying hash filters to improving the execution of multi-join queries

The VLDB Journal — The International Journal on Very Large Data Bases
D-SPTF: decentralized request distribution in brick-based storage systems

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Model and procedure for performance and availability-wise parallel warehouses

Distributed and Parallel Databases
Load-balancing for WAN warehouses

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

Several data replication strategies have been proposed to provide high data availability for database applications. However, the tradeoffs among the different strategies for various workloads and different operating modes have not been studied before. In this paper, we study the relative performance of chained declustering, mirrored disks, and interleaved declustering, in a shared nothing database machine environment. In particular, we have examined the relative performance of the three strategies when no failures have occurred, the effect of load imbalance caused by a disk or processor failure on system throughput and response time, and the tradeoff between the benefit of intra query parallelism and process scheduling overhead.Experimental results obtained from a simulation study indicates that, in the normal mode of operation, chained declustering and interleaved declustering perform comparably. Both perform better than mirrored disks if an application is I/O bound, but slightly worse than mirrored disks if the application is CPU bound. In the event of a disk failure, because chained declustering is able to balance the workload among all remaining operational disks while the other two cannot, it provides noticeably better performance than interleaved declustering and much better performance than mirrored disks.