Optimizing Array-Intensive Applications for On-Chip Multiprocessors

Authors:
Ismail Kadayif;Mahmut Kandemir;Guilin Chen;Ozcan Ozturk;Mustafa Karakoy;Ugur Sezer
Affiliations:
IEEE;IEEE;IEEE;IEEE;-;IEEE
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2005

Citing 38
Cited 1

Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Integer and combinatorial optimization

Integer and combinatorial optimization
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching
Evaluation of design alternatives for a multiprocessor microprocessor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Threshold-voltage control schemes through substrate-bias for low-power high-speed CMOS LSI design

Journal of VLSI Signal Processing Systems - Special issue on technologies for wireless computing
Analytical energy dissipation models for low-power caches

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Memory exploration for low power, embedded systems

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
System-level power consumption modeling and tradeoff analysis techniques for superscalar processor design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
System-level power optimization: techniques and tools

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Energy-driven integrated hardware-software optimizations using SimplePower

Proceedings of the 27th annual international symposium on Computer architecture
Profile-driven code execution for low power dissipation (poster session)

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Low power DSP's for wireless communications (embedded tutorial session)

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Dynamic power management for portable systems

MobiCom '00 Proceedings of the 6th annual international conference on Mobile computing and networking
A static power model for architects

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Extending lifetime of portable systems by battery scheduling

Proceedings of the conference on Design, automation and test in Europe
Power aware page allocation

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Power and energy reduction via pipeline balancing

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Low-energy for deep-submicron address buses

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
An energy saving strategy based on adaptive loop parallelization

Proceedings of the 39th annual Design Automation Conference
An integer linear programming based approach for parallelizing applications in On-chip multiprocessors

Proceedings of the 39th annual Design Automation Conference
Loop Parallelization

Loop Parallelization
Drowsy caches: simple techniques for reducing leakage power

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Design of High-Performance Microprocessor Circuits

Design of High-Performance Microprocessor Circuits
Exploiting VLIW schedule slacks for dynamic and leakage energy reduction

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Leakage Energy Management in Cache Hierarchies

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
DRAM Energy Management Using Sof ware and Hardware Directed Power Mode Control

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Architectural-level power estimation for system-on-a-chip

Architectural-level power estimation for system-on-a-chip
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors

Proceedings of the conference on Design, automation and test in Europe - Volume 2
The energy efficiency of CMP vs. SMT for multimedia workloads

Proceedings of the 18th annual international conference on Supercomputing
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A clock power model to evaluate impact of architectural and technology optimizations

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Experimentation with SMT solvers and theorem provers for verification of loop and arithmetic transformations

Proceedings of the 5th IBM Collaborative Academia Research Exchange Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

With energy consumption becoming one of the first-class optimization parameters in computer system design, compilation techniques that consider performance and energy simultaneously are expected to play a central role. In particular, compiling a given application code under performance and energy constraints is becoming an important problem. In this paper, we focus on an on-chip multiprocessor architecture and present a set of code optimization strategies. We first evaluate an adaptive loop parallelization strategy (i.e., a strategy that allows each loop nest to execute using a different number of processors if doing so is beneficial) and measure the potential energy savings when unused processors during execution of a nested loop are shut down (i.e., placed into a power-down or sleep state). Our results show that shutting down unused processors can lead to as much as 67 percent energy savings at the expense of up to 17 percent performance loss in a set of array-intensive applications. To eliminate this performance penalty, we also discuss and evaluate a processor preactivation strategy based on compile-time analysis of nested loops. Based on our experiments, we conclude that an adaptive loop parallelization strategy combined with idle processor shut down and preactivation can be very effective in reducing energy consumption without increasing execution time. We then generalize our strategy and present an application parallelization strategy based on integer linear programming (ILP). Given an array-intensive application, our optimization strategy determines the number of processors to be used in executing each loop nest based on the objective function and additional compilation constraints provided by the user/programmer. Our initial experience with this constraint-based optimization strategy shows that it is very successful in optimizing array-intensive applications on on-chip multiprocessors under multiple energy and performance constraints.