X10-enabled MapReduce

  • Authors:
  • Han Dong;Shujia Zhou;David Grove

  • Affiliations:
  • University of Maryland Baltimore County;University of Maryland Baltimore County;IBM

  • Venue:
  • Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The MapReduce framework has become a popular and powerful tool to process large datasets in parallel over a cluster of computing nodes [1]. Currently, there are many flavors of implementations of MapReduce, among which the most popular is the Hadoop implementation in Java [5]. However, these implementations either rely on third-party file systems for across-computer-node communication or are difficult to implement with socket programming or communication libraries such as MPI. To address these challenges, we investigated utilizing the X10 language to implement MapReduce and tested it with the word-count use case. The key performance factor in implementing MapReduce is data moving across different computer nodes. Since X10 has built-in functions for across-node communication such as distributed arrays [2], a major challenge with MapReduce implementations is easily solved. We tested two main implementations: the first utilizes the HashMap data structure and the second a Rail with elements consisting of a string and integer pair. The performance of these two implementations are analyzed and discussed.