Implementation of data affinity-based distributed parallel processing on a distributed key value store

Authors:
Naoko Hishinuma;Atsuko Takefusa;Hidemoto nakada;Masato Oguchi
Affiliations:
Ochanomizu University, Otsuka, Bunkyo-ku, Tokyo, Japan;National Institute of Advanced, Industrial Science and Technology(AIST), Umezono, Tsukuba, Ibaraki, Japan;National Institute of Advanced, Industrial Science and Technology(AIST), Umezono, Tsukuba, Ibaraki, Japan;Ochanomizu University, Otsuka, Bunkyo-ku, Tokyo, Japan
Venue:
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Year:
2014

Citing 10
Cited 0

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions

Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing
Comet: an active distributed key-value store

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
The SDSC storage resource broker

CASCON First Decade High Impact Papers
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
ParaLite: Supporting Collective Queries in Database System to Parallelize User-Defined Executable

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
MyCassandra: a cloud storage supporting both read heavy and write heavy workloads

Proceedings of the 5th Annual International Systems and Storage Conference
Cassandra: The Definitive Guide

Cassandra: The Definitive Guide

Quantified Score

Hi-index	0.00

Visualization

Abstract

The spread of cloud computing has increased the necessity of accumulating large amounts of data and performing high-speed data processing. Because strict consistency is not necessarily required for such large amount of data that cloud computing stores, a distributed Key Value Stores (KVS) is considered suitable for their data storage, based on an eventual consistency paradigm. In order to provide services such as SNS, mining and statistical processing of these data is indispensable. However because general distributed KVS systems are not designed for processing, these data must be transferred to distributed file systems such as HDFS, which enables data processing. The transfer cost issue has occurred in this case. To find a solution for this issue, we propose a method that performs high-speed data processing directly on a distributed KVS. In this paper, we extend the Apache Cassandra database, a distributed KVS that handles large amounts of data, to enable data affinity-based parallel processing. The parallel data processing mechanism runs the local processing on the stored values at each data node that stores the values, and it then returns only the results of the processing as an answer to a request. From the evaluation experiments, the proposed method is shown to be faster than the typically used Cassandra approach. In addition, even if the writing process is performed in the background while processing the data, the processing efficiency is appropriate for specific loads. The experimental results show that the data processing can be performed during the process of writing at approximately 10 Mbyte/sec if there are eight data nodes in the experiment environment.