Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications

Authors:
Luís Fabrício Góes;Christiane Pousa Ribeiro;Márcio Castro;Jean-François Méhaut;Murray Cole;Marcelo Cintra
Affiliations:
PPGEE, GSDC Group, Pontifícia Universidade Católica de Minas Gerais, Belo Horizonte, Brazil;INRIA, CEA, LIG Laboratory, Grenoble University, Grenoble, France;INRIA, CEA, LIG Laboratory, Grenoble University, Grenoble, France;INRIA, CEA, LIG Laboratory, Grenoble University, Grenoble, France;School of Informatics, ICSA, CARD Group, University of Edinburgh, Edinburgh, UK;School of Informatics, ICSA, CARD Group, University of Edinburgh, Edinburgh, UK
Venue:
International Journal of Parallel Programming
Year:
2014

Citing 19
Cited 0

Algorithmic skeletons: structured management of parallel computation

Algorithmic skeletons: structured management of parallel computation
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A Portable Programming Interface for Performance Evaluation on Modern Processors

International Journal of High Performance Computing Applications
Design and Implementation of a Compiler Framework for Helper Threading on Multi-core Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
The OpenTM Transactional Application Programming Interface

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Dynamic performance tuning of word-based software transactional memory

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A view of the parallel computing landscape

Communications of the ACM - A View of Parallel Computing
Employing Transactional Memory and Helper Threads to Speedup Dijkstra's Algorithm

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Handling the problems and opportunities posed by multiple on-chip memory controllers

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Structured parallel programming with deterministic patterns

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Process variation aware thread mapping for chip multiprocessors

Proceedings of the Conference on Design, Automation and Test in Europe
Transactional mutex locks

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Evaluating Thread Placement Based on Memory Access Patterns for Multi-core Processors

HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
Improving memory affinity of geophysics applications on NUMA platforms using minas

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
A machine learning-based approach for thread mapping on transactional memory applications

HIPC '11 Proceedings of the 2011 18th International Conference on High Performance Computing
Dynamic thread mapping based on machine learning for transactional memory applications

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Autotuning Skeleton-Driven Optimizations for Transactional Worklist Applications

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory affinity has become a key element to achieve scalable performance on multi-core platforms. Mechanisms such as thread scheduling, page allocation and cache prefetching are commonly employed to enhance memory affinity which keeps data close to the cores that access it. In particular, software transactional memory (STM) applications exhibit irregular memory access behavior that makes harder to determine which and when data will be needed by each core. Additionally, existing STM runtime systems are decoupled from issues such as thread and memory management. In this paper, we thus propose a skeleton-driven mechanism to improve memory affinity on STM applications that fit the worklist pattern employing a two-level approach. First, it addresses memory affinity in the DRAM level by automatic selecting page allocation policies. Then it employs data prefetching helper threads to improve affinity in the cache level. It relies on a skeleton framework to exploit the application pattern in order to provide automatic memory page allocation and cache prefetching. Our experimental results on the STAMP benchmark suite show that our proposed mechanism can achieve performance improvements of up to 46 %, with an average of 11 %, over a baseline version on two NUMA multi-core machines.