Task scheduling on manycore processors with home caches

Authors:
Ananya Muddukrishna;Artur Podobas;Mats Brorsson;Vladimir Vlassov
Affiliations:
KTH Royal Institute of Technology, Sweden;KTH Royal Institute of Technology, Sweden;KTH Royal Institute of Technology, Sweden;KTH Royal Institute of Technology, Sweden
Venue:
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Year:
2012

Citing 5
Cited 0

Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Evaluation of OpenMP task scheduling strategies

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Scheduling task parallelism on multi-socket multicore systems

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern manycore processors feature a highly scalable and software-configurable cache hierarchy. For performance, manycore programmers will not only have to efficiently utilize the large number of cores but also understand and configure the cache hierarchy to suit the application. Relief from this manycore programming nightmare can be provided by task-based programming models where programmers parallelize using tasks and an architecture-specific runtime system maps tasks to cores and in addition configures the cache hierarchy. In this paper, we focus on the cache hierarchy of the Tilera TILEPro64 processor which features a software-configurable coherence waypoint called the home cache. We first show the runtime system performance bottleneck of scheduling tasks oblivious to the nature of home caches. We then demonstrate a technique in which the runtime system controls the assignment of home caches to memory blocks and schedules tasks to minimize home cache access penalties. Test results of our technique have shown a significant execution time performance improvement on selected benchmarks leading to the conclusion that by taking processor architecture features into account, task-based programming models can indeed provide continued performance and allow programmers to smoothly transit from the multicore to manycore era.