Distributed and fault-tolerant execution framework for transaction processing

  • Authors:
  • Toshio Suganuma;Akira Koseki;Kazuaki Ishizaki;Yohei Ueda;Ken Mizuno;Daniel Silva;Hideaki Komatsu;Toshio Nakatani

  • Affiliations:
  • IBM Research - Tokyo, Shimo-tsuruma, Yamato-shi, Japan;IBM Research - Tokyo, Shimo-tsuruma, Yamato-shi, Japan;IBM Research - Tokyo, Shimo-tsuruma, Yamato-shi, Japan;IBM Research - Tokyo, Shimo-tsuruma, Yamato-shi, Japan;IBM Research - Tokyo, Shimo-tsuruma, Yamato-shi, Japan;IBM Research - Tokyo, Shimo-tsuruma, Yamato-shi, Japan;IBM Research - Tokyo, Shimo-tsuruma, Yamato-shi, Japan;IBM Research - Tokyo, Shimo-tsuruma, Yamato-shi, Japan

  • Venue:
  • Proceedings of the 4th Annual International Conference on Systems and Storage
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

There is a growing need for efficient distributed computing for transaction processing. One of the key requirements for runtime systems in distributed environments is fault tolerance. Such a system needs to preserve the data consistency at transaction boundaries so as to resume the ongoing tasks from checkpoints with consistent data for any component failure. Another key requirement is that the system needs to be lightweight enough in normal execution to provide scalable performance. This paper presents the design and implementation of a new fault tolerant execution framework that addresses both of these requirements. We replicate each partition of the distributed persistent data on three nodes (triplet) with two different types of backups, one using warm replication and the other using cold replication. For node failures, the system is automatically recoverable unless all three nodes in any triplet fail at the same time. The system tolerates simultaneous two-node failures in any triplet most of the cases. We obtained a new trade-off in that 43% performance improvements can be achieved by slightly compromising the system availability.