Efficient data distribution strategy for join query processing in the cloud

  • Authors:
  • Haiping Wang;Xiaofeng Meng;Yunpeng Chai

  • Affiliations:
  • Renmin University of China, Beijing, China;Renmin University of China, Beijing, China;Renmin University of China, Beijing, China

  • Venue:
  • Proceedings of the third international workshop on Cloud data management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

There are many advantages for large scale data management in the cloud. More and more companies start to migrate their data into cloud data management systems. Join query becomes a challenging research problem in cloud. To finish a join query in the cloud, data among different nodes need to be transferred. The arrangement of data transmission and local data processing is known as a distribution strategy for a query. The transmission cost (network workload between servers and the transmission time delay) will be very high if the strategy is not properly chosen. Existing cloud systems either do not support join query or just use MapReduce to support some simple join queries. The problem of using redundant data for join query optimization in cloud environment is studied in this paper. Two novel algorithms, Set Cover based algorithm (SC) and Minimum Element based algorithm (ME), are proposed to reduce data transmission cost. The experiment results demonstrate that the proposed methods can greatly reduce the data transmission cost compared with the naive method. Besides, the result is very close to the optimal strategy.