An implementation of the codelet model

  • Authors:
  • Joshua Suettlerlein;Stéphane Zuckerman;Guang R. Gao

  • Affiliations:
  • University of Delaware, Newark, DE;University of Delaware, Newark, DE;University of Delaware, Newark, DE

  • Venue:
  • Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Chip architectures are shifting from few, faster, functionally heavy cores to abundant, slower, simpler cores to address pressing physical limitations such as energy consumption and heat expenditure. As architectural trends continue to fluctuate, we propose a novel program execution model, the Codelet model, which is designed for new systems tasked with efficiently managing varying resources. The Codelet model is a fine-grained dataflow inspired model extended to address the cumbersome resources available in new architectures. In the following, we define the Codelet execution model as well as provide an implementation named DARTS. Utilizing DARTS and two predominant kernels, matrix multiplication and the Graph 500's breadth first search, we explore the validity of fine-grain execution as a promising and viable execution model for future and current architectures. We show that our runtime is on par or performs better than AMD's highly-optimized parallel library for matrix multication, outperforming it on average by 1.40× with a speedup up to 4×. Our implementation of the parallel BFS outperforms Graph 500's reference implementation (with or without dynamic scheduling) on average by 1.50× with a speed up of up to 2.38×.