Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Design Challenges of Technology Scaling
IEEE Micro
Compile/Run-Time Support for Thread Migration
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Proceedings of the 30th annual international symposium on Computer architecture
Reducing power density through activity migration
Proceedings of the 2003 international symposium on Low power electronics and design
Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Comparing Program Phase Detection Techniques
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Power-aware communication optimization for networks-on-chips with voltage scalable links
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Adaptive execution techniques for SMT multiprocessor architectures
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting the Cache Capacity of a Single-Chip Multi-Core Processor with Execution Migration
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Design and analysis of an NoC architecture from performance, reliability and energy perspective
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Performance implications of single thread migration on a chip multi-core
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Migration in Single Chip Multiprocessors
IEEE Computer Architecture Letters
Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs
ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
An ILP based approach to address code generation for digital signal processors
GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Chip multiprocessing and the cell broadband engine
Proceedings of the 3rd conference on Computing frontiers
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Online power-performance adaptation of multithreaded programs using hardware event-based prediction
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Reparallelization and Migration of OpenMP Programs
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications
IEEE Transactions on Parallel and Distributed Systems
Enabling scalability and performance in a large scale CMP environment
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Dynamic voltage frequency scaling for multi-tasking systems using online learning
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
A Framework for Providing Quality of Service in Chip Multi-Processors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Parallelism orchestration using DoPE: the degree of parallelism executive
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parcae: a system for flexible parallel execution
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Hi-index | 0.00 |
In parallel to the changes in both the architecture domain-the move toward chip multiprocessors (CMPs)-and the application domain-the move toward increasingly data-intensive workloads-issues such as performance, energy efficiency and CPU availability are becoming increasingly critical. The CPU availability can change dynamically due to several reasons such as thermal overload, increase in transient errors, or operating system scheduling. An important question in this context is how to adapt, in a CMP, the execution of a given application to CPU availability change at runtime. Our paper studies this problem, targeting the energy-delay product (EDP) as the main metric to optimize. We first discuss that, in adapting the application execution to the varying CPU availability, one needs to consider the number of CPUs to use, the number of application threads to accommodate and the voltage/frequency levels to employ (if the CMP has this capability). We then propose to use helper threads to adapt the application execution to CPU availability change in general with the goal of minimizing the EDP. The helper thread runs parallel to the application execution threads and tries to determine the ideal number of CPUs, threads and voltage/frequency levels to employ at any given point in execution. We illustrate this idea using four applications (Fast Fourier Transform, MultiGrid, LU decomposition and Conjugate Gradient) under different execution scenarios. The results collected through our experiments are very promising and indicate that significant EDP reductions are possible using helper threads. For example, we achieved up to 66.3%, 83.3%, 91.2%, and 94.2% savings in EDP when adjusting all the parameters properly in applications FFT, MG, LU, and CG, respectively. We also discuss how our approach can be extended to address multi-programmed workloads.