Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications
ACM SIGMETRICS Performance Evaluation Review
Using multiple energy gears in MPI programs on a power-scalable cluster
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
A Power-Aware Run-Time System for High-Performance Computing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Just-in-time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs
Journal of Parallel and Distributed Computing
Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
Adagio: making DVS practical for complex HPC applications
Proceedings of the 23rd international conference on Supercomputing
Energy Profiling and Analysis of the HPC Challenge Benchmarks
International Journal of High Performance Computing Applications
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications
IEEE Transactions on Parallel and Distributed Systems
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Iso-Energy-Efficiency: An Approach to Power-Constrained Parallel Computation
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
MuMMI: multiple metrics modeling infrastructure for exploring performance and power modeling
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Energy consumption is a major concern with high-performance multicore systems. In this paper, we explore the energy consumption and performance (execution time) characteristics of different parallel implementations of scientific applications. In particular, the experiments focus on message-passing interface (MPI)-only versus hybrid MPI/OpenMP implementations for hybrid the NAS (NASA Advanced Supercomputing) BT (Block Tridiagonal) benchmark (strong scaling), a Lattice Boltzmann application (strong scaling), and a Gyrokinetic Toroidal Code â聙聰 GTC (weak scaling), as well as central processing unit (CPU) frequency scaling. Experiments were conducted on a system instrumented to obtain power information; this system consists of eight nodes with four cores per node. The results indicate, with respect to the MPI-only versus the hybrid implementation, that the best implementation is dependent upon the application executed on 16 or fewer cores. For the case of 32 cores, the results were consistent in that hybrid implementation resulted in less execution time and energy. With CPU frequency scaling, the best case for energy saving was not the best case for execution time.