Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models

Authors:
Abhinav Vishnu;Shuaiwen Song;Andres Marquez;Kevin Barker;Darren Kerbyson;Kirk Cameron;Pavan Balaji
Affiliations:
-;-;-;-;-;-;-
Venue:
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Year:
2010

Citing 33
Cited 3

A dynamic disk spin-down technique for mobile computing

MobiCom '96 Proceedings of the 2nd annual international conference on Mobile computing and networking
Global arrays: a nonuniform memory access programming model for high-performance computers

The Journal of Supercomputing
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
System-level power optimization: techniques and tools

ACM Transactions on Design Automation of Electronic Systems (TODAES)
The design and use of simplepower: a cycle-accurate energy estimation tool

Proceedings of the 37th Annual Design Automation Conference
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Design issues for dynamic voltage scaling

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Compiler-directed dynamic voltage/frequency scheduling for energy reduction in microprocessors

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
The Quadrics Network: High-Performance Clustering Technology

IEEE Micro
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Adaptive Disk Spin-down Policies for Mobile Computers

MLICS '95 Proceedings of the 2nd Symposium on Mobile and Location-Independent Computing
MPI-2: Extending the Message-Passing Interface

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Orion: a power-performance simulator for interconnection networks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The Bladed Beowulf: A Cost-Effective Alternative to Traditional Beowulfs

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Power and Energy Profiling of Scientific Applications on Distributed Systems

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Exploring the Energy-Time Tradeoff in MPI Programs on a Power-Scalable Cluster

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Modeling Hard-Disk Power Consumption

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
High-Performance, Power-Aware Distributed Computing for Scientific Applications

Computer
Multi-Bank Main Memory Architecture with Dynamic Voltage Frequency Scaling for System Energy Optimization

DSD '06 Proceedings of the 9th EUROMICRO Conference on Digital System Design
The HPC Challenge (HPCC) benchmark suite

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Making scheduling "cool": temperature-aware workload placement in data centers

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications

IEEE Transactions on Parallel and Distributed Systems
CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters

ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
Bounding energy consumption in large-scale MPI programs

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

Proceedings of the 22nd annual international conference on Supercomputing
Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
Prediction models for multi-dimensional power-performance optimization on many cores

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Evaluating high performance communication: a power perspective

Proceedings of the 23rd international conference on Supercomputing
A feasibility analysis of power-awareness and energy minimization in modern interconnects for high-performance computing

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Energy Profiling and Analysis of the HPC Challenge Benchmarks

International Journal of High Performance Computing Applications
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications

IEEE Transactions on Parallel and Distributed Systems

An efficient kernel-level blocking MPI implementation

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Energy saving strategies for parallel applications with point-to-point communication phases

Journal of Parallel and Distributed Computing
Initial investigation of a scheme to use instantaneous CPU power consumption for energy savings format

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The insatiable demand of high performance computing is being driven by the most computationally intensive applications such as computational chemistry, climate modeling, nuclear physics, etc. The last couple of decades have observed a tremendous rise in supercomputers with architectures ranging from traditional clusters to system-on-a-chip in order to achieve the petaflop computing barrier. However, with advent of petaflop-plus computing, we have ushered in an era where power efficient system software stack is imperative for execution on exascale systems and beyond. At the same time, computationally intensive applications are exploring programming models beyond traditional message passing, as a combination of Partitioned Global Address Space (PGAS) languages and libraries, providing one-sided communication paradigm with put, get and accumulate primitives. To support the PGAS models, it is critical to design power efficient and high performance one-sided communication runtime systems. In this paper, we design and implement PASCoL, a high performance power aware one-sided communication library using Aggregate Remote Memory Copy Interface (ARMCI), the communication runtime system of Global Arrays. For various communication primitives provided by ARMCI, we study the impact of Dynamic Voltage/Frequency Scaling (DVFS) and a combination of interrupt (blocking)/polling based mechanisms provided by most modern interconnects. We implement our design and evaluate it with synthetic benchmarks using an Infini Band cluster. Our results indicate that PASCoL can achieve significant reduction in energy consumed per byte transfer without additional penalty for various one-sided communication primitives and various message sizes and data transfer patterns.