The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Proceedings of the 39th annual Design Automation Conference
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Data Mining and Knowledge Discovery
Design Challenges of Technology Scaling
IEEE Micro
Exploring the Design Space of Future CMPs
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Power-Sensitive Multithreaded Architecture
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors
Proceedings of the conference on Design, automation and test in Europe - Volume 2
The energy efficiency of CMP vs. SMT for multimedia workloads
Proceedings of the 18th annual international conference on Supercomputing
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Understanding the energy efficiency of simultaneous multithreading
Proceedings of the 2004 international symposium on Low power electronics and design
Power-optimal pipelining in deep submicron technology
Proceedings of the 2004 international symposium on Low power electronics and design
Best of Both Latency and Throughput
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Performance, Energy, and Thermal Considerations for SMT and CMP Architectures
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Mitigating Amdahl's Law through EPI Throttling
Proceedings of the 32nd annual international symposium on Computer Architecture
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
On evaluating request-distribution schemes for saving energy in server clusters
ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
Quantitative performance analysis of the SPEC OMPM2001 benchmarks
Scientific Programming - OpenMP
Energy conservation policies for web servers
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Energy-efficient server clusters
PACS'02 Proceedings of the 2nd international conference on Power-aware computer systems
Integrated analysis of power and performance for pipelined microprocessors
IEEE Transactions on Computers
Using fine grain multithreading for energy efficient computing
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Towards optimizing energy costs of algorithms for shared memory architectures
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Area-efficient floorplans and interconnects for homogeneous multi-core architectures
International Journal of High Performance Systems Architecture
An integrated GPU power and performance model
Proceedings of the 37th annual international symposium on Computer architecture
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Scheduling of stream-based real-time applications for heterogeneous systems
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Case studies of multi-core energy efficiency in task based programs
ICT-GLOW'12 Proceedings of the Second international conference on ICT as Key Technology against Global Warming
Computational sprinting on a hardware/software testbed
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Exploring hybrid memory for GPU energy efficiency through software-hardware co-design
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
This paper looks at the power-performance implications of running parallel applications on chip multiprocessors (CMPs). First, we develop an analytical model that, for the first time, puts together parallel efficiency, granularity of parallelism, and voltage/frequency scaling, to establish a formal connection with the power consumption and performance of a parallel code running on a CMP. We then conduct detailed simulations of parallel applications running on a detailed power-performance CMP model to confirm the analytical results and provide further insights. Both analytical and experimental models show that parallel computing can bring significant power savings and still meet a given performance target by choosing granularity and voltage/frequency levels judiciously. The particular choice, however, is dependent on the application's parallel efficiency curve and the process technology utilized, which our model captures. Likewise, analytical model and experiments show the effect of a limited power budget on the application's scalability curve. In particular, we show that a limited power budget can cause a rapid performance degradation beyond a number of cores, even in the case of applications with excellent scalability properties. On the other hand, our experiments show that, when a limited power budget is in place, power-thrifty memory-bound applications may actually enjoy better scalability than more compute-intensive codes, even if the latter would exhibit higher scalability in a power-unconstrained scenario.