Parallel programming framework for large batch transaction processing on scale-out systems

Authors:
Kazuaki Ishizaki;Ken Mizuno;Toshio Suganuma;Daniel Silva;Akira Koseki;Hideaki Komatsu;Yohei Ueda;Toshio Nakatani
Affiliations:
IBM Research - Tokyo;IBM Research - Tokyo;IBM Japan;IBM Corporation;IBM Research - Tokyo;IBM Research - Tokyo;IBM Research - Tokyo;IBM Research - Tokyo
Venue:
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Year:
2010

Citing 21
Cited 1

Distributed programming in Argus

Communications of the ACM
Inheritance of Synchronization and Recovery Properties in Avalon/C++

Computer
Foundations of parallel programming

Foundations of parallel programming
Principles of distributed database systems (2nd ed.)

Principles of distributed database systems (2nd ed.)
The state of the art in locally distributed Web-server systems

ACM Computing Surveys (CSUR)
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
Notes on Data Base Operating Systems

Operating Systems, An Advanced Course
Two-Layer Transaction Management for Workflow Management Applications

DEXA '97 Proceedings of the 8th International Conference on Database and Expert Systems Applications
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
JDBC API Tutorial and Reference

JDBC API Tutorial and Reference
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Distributed Systems: Concepts and Design (4th Edition) (International Computer Science)

Distributed Systems: Concepts and Design (4th Edition) (International Computer Science)
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Sprint: a middleware for high-performance transaction processing

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
The end of an architectural era: (it's time for a complete rewrite)

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
An architecture for recycling intermediates in a column-store

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Building a high-level dataflow system on top of Map-Reduce: the Pig experience

Proceedings of the VLDB Endowment
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Distributed and fault-tolerant execution framework for transaction processing

Proceedings of the 4th Annual International Conference on Systems and Storage

Quantified Score

Hi-index	0.00

Visualization

Abstract

A scale-out system is a cluster of commodity machines, and offers a good platform to support steadily increasing workloads that process growing data sets. Sharding [4] is a method of partitioning data and processing a computation on a scale-out system. In a database system, a large table can be partitioned into small tables so each node can process its part of the computation. The sharding approach in a large batch transaction processing, which is important in financial area, presents two hard problems to programmers. Programmers have to write complex code (1) to transfer the input data so as to align the computations with the data partitions, and (2) to manage the distributed transactions. This paper presents a new parallel programming framework that makes parallel transactional programming easier by specifying transaction scopes and partitioners to simplify the code. Transaction scopes include series of subtransactions, each of which performs local operations. The system manages the distributed transactions automatically. A partitioner represents how the computation should be decomposed and aligned with the data partitions to avoid remote database accesses. Between paired of subtransactions, the system handles the data shuffling across the network. We implemented our parallel programming framework as a new Java class library. We hide all of the complex details of data transfer and distributed transaction management in the library. Our programming framework can eliminate almost 66% of the lines of code compared to a current programming approach without programming framework support. We also confirmed good scalability, with a scaling factor of 20.6 on 24 nodes using our modified batch program for the TPC-C benchmark.