Automatic and interactive parallelization
Automatic and interactive parallelization
System-level power optimization: techniques and tools
ACM Transactions on Design Automation of Electronic Systems (TODAES)
The design and use of simplepower: a cycle-accurate energy estimation tool
Proceedings of the 37th Annual Design Automation Conference
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Power aware microarchitecture resource scaling
Proceedings of the conference on Design, automation and test in Europe
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Proceedings of the 39th annual Design Automation Conference
Design of High-Performance Microprocessor Circuits
Design of High-Performance Microprocessor Circuits
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Online strategies for high-performance power-aware thread execution on emerging multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Shared Register File Based ILP for Multicore
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Hi-index | 0.00 |
Chip multiprocessing (or multiprocessor system-on-a-chip) is a technique that combines two or more processor cores on a single piece of silicon to enhance computing performance. An important problem to be addressed in executing applications on an on-chip multiprocessor environment is to select the most suitable number of processors to use for a given objective function (e.g., minimizing execution time or energy-delay product) under multiple constraints. Previous research proposed an ILP-based solution to this problem that is based on exhaustive evaluation of each nest under all possible processor sizes. In this paper, we take a different approach and propose a pure runtime strategy for determining the best number of processors to use at run-time. This approach is more general than static techniques and can be applicable in situations where the latter cannot be.