A Discussion in Favor of Dynamic Scheduling for Regular Applications in Many-core Architectures

  • Authors:
  • Elkin Garcia;Daniel Orozco;Robert Pavel;Guang R. Gao

  • Affiliations:
  • -;-;-;-

  • Venue:
  • IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The recent evolution of many-core architectures has resulted in chips where the number of processor elements (PEs) are in the hundreds and continue to increase every day. In addition, many-core processors are more and more frequently characterized by the diversity of their resources and the way the sharing of those resources is arbitrated. On such machines, task scheduling is of paramount importance to orchestrate a satisfactory distribution of tasks with an efficient utilization of resources, especially when fine-grain parallelism is desired or required. In the past, the primary focus of scheduling techniques has been on achieving load balancing and reducing overhead with the aim to increase total performance. This focus has resulted in a scheduling paradigm where Static Scheduling (SS) is preferred to Dynamic Scheduling (DS) for highly regular and embarrassingly parallel applications running on homogeneous architectures. We have revisited the task scheduling problem for these types of applications under the scenario imposed by many-core architectures to investigate whether or not there exists scenarios where DS is better than SS. Our main contribution is the idea that, for highly regular and embarrassingly parallel applications, DS is preferable to SS in some situations commonly found in many-core architectures. We present experimental evidence that shows how the performance of SS is degraded by the new environment on many-core chips. We analyze three reasons that contribute to the superiority of DS over SS on many-core architectures under the situations described: 1) A uniform mapping of work to processors without considering the granularity of tasks is not necessarily scalable under limited amounts of work. 2) The presence of shared resources (i.e. the crossbar switch) produces unexpected and stochastic variations on the duration of tasks that SS is unable to manage properly. 3) Hardware features, such as in-memory atomic operations, greatly contribute to decrease the overhead of DS.