Case studies of multi-core energy efficiency in task based programs

Authors:
Hallgeir Lien;Lasse Natvig;Abdullah Al Hasib;Jan Christian Meyer
Affiliations:
Dept. of Computer and Information Science (IDI), NTNU, Trondheim, Norway;Dept. of Computer and Information Science (IDI), NTNU, Trondheim, Norway;Dept. of Computer and Information Science (IDI), NTNU, Trondheim, Norway;High Performance Computing Section, IT Dept., NTNU, Norway
Venue:
ICT-GLOW'12 Proceedings of the Second international conference on ICT as Key Technology against Global Warming
Year:
2012

Citing 6
Cited 0

Power-performance considerations of parallel computing on chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Models and Metrics to Enable Energy-Efficiency Optimizations

Computer
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications

IEEE Transactions on Parallel and Distributed Systems
Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors

Computer Science - Research and Development
Flexible workload generation for HPC cluster efficiency benchmarking

Computer Science - Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present three performance and energy case studies of benchmark applications in the OmpSs environment for task based programming. Different parallel and vectorized implementations are evaluated on an Intel® CoreTMi7-2600 quad-core processor. Using FLOPS/W derived from chip MSR registers, we find AVX code to be clearly most energy efficient in general. The peak on-chip GFLOPS/W rates are: Black-Scholes (BS) 0.89, FFTW 1.38 and Matrix Multiply (MM) 1.97. Experiments cover variable degrees of thread parallelism and different OmpSs task pool scheduling policies. We find that maximum energy efficiency for small and medium sized problems is obtained by limiting the number of parallel threads. Comparison of AVX variants with non-vectorized code shows ≈6−7 × (BS) and ≈3−5 × (FFTW) improvements in on-chip energy efficiency, depending on the problem size and degree of multithreading.