Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler optimizations for eliminating barrier synchronization
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Comparing algorithm for dynamic speed-setting of a low-power CPU
MobiCom '95 Proceedings of the 1st annual international conference on Mobile computing and networking
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Energy-conscious compilation based on voltage scaling
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Proceedings of the 39th annual Design Automation Conference
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Power optimization of real-time embedded systems on variable speed processors
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Real-Time Task Scheduling for a Variable Voltage Processor
Proceedings of the 12th international symposium on System synthesis
Policies for dynamic clock scheduling
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Scheduling for reduced CPU energy
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Dynamic voltage and frequency scaling for scientific applications
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Optimizing Array-Intensive Applications for On-Chip Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Locality-conscious workload assignment for array-based computations in MPSOC architectures
Proceedings of the 42nd annual Design Automation Conference
Software-directed power-aware interconnection networks
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Power-performance considerations of parallel computing on chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Software-directed power-aware interconnection networks
ACM Transactions on Architecture and Code Optimization (TACO)
Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Accomodating Diversity in CMPs with Heterogeneous Frequencies
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Operating system scheduling for efficient online self-test in robust systems
Proceedings of the 2009 International Conference on Computer-Aided Design
Comparing scalability prediction strategies on an SMP of CMPs
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
An effective speedup metric for measuring productivity in large-scale parallel computer systems
The Journal of Supercomputing
An energy-efficient heterogeneous CMP based on hybrid TFET-CMOS cores
Proceedings of the 48th Design Automation Conference
Compile-Time energy optimization for parallel applications in on-chip multiprocessors
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Evaluation of Low-Power Computing when Operating on Subsets of Multicore Processors
Journal of Signal Processing Systems
Hi-index | 0.00 |
Advances in semiconductor technology are enabling designs with several hundred million transistors. Since building sophisticated single processor based systems is a complex process from design, veri.cation, and software development perspectives, the use of chip multiprocessing is inevitable in future microprocessors. In fact, the abundance of explicit loop-level parallelism in many embedded applications helps us identify chip multiprocessing as one of the most promising directions in designing systems for embedded applications. Another architectural trend that we observe in embedded systems, namely, multi-voltage processors, is driven by the need of reducing energy consumption during program execution. Practical implementations such as Transmeta's Crusoe and Intel's XScale tune processor voltage/frequency depending on current execution load. Considering these two trends, chip multiprocessing and voltage/frequency scaling, this paper presents an optimization strategy for an architecture that makes use of both chip parallelism and voltage scaling. In our proposal, the compiler takes advantage of heterogeneity in parallel execution between the loads of different processors and assigns different voltages/frequencies to different processors if doingso reduces energy consumption without increasing overall execution cycles signi.cantly. Our experiments with a set of applications show that this optimization can bring large energy benefits without much performance loss.