Global productiveness propagation: a code optimization technique to speculatively prune useless narrow computations

Authors:
Indu Bhagat;Enric Gibert;Jesús Sánchez;Antonio González
Affiliations:
Universitat Politecnica de Catalunya, Barcelona, Spain;Intel Labs-UPC, Barcelona, Spain;Intel Labs-UPC, Barcelona, Spain;Intel Labs-UPC, Barcelona, Spain
Venue:
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Year:
2011

Citing 23
Cited 0

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Value profiling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Energy-driven integrated hardware-software optimizations using SimplePower

Proceedings of the 27th annual international symposium on Computer architecture
Bidwidth analysis with application to silicon compilation

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Self-Timed Carry-Lookahead Adders

IEEE Transactions on Computers - Special issue on computer arithmetic
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Bit section instruction set extension of ARM for embedded applications

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Bitwidth aware global register allocation

POPL '03 Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Exploiting data-width locality to increase superscalar execution bandwidth

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Leakage Current: Moore's Law Meets Static Power

Computer
Software-Controlled Operand-Gating

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Speculative software management of datapath-width for energy optimization

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Small, Fast and Low-Power Register File by Bit-Partitioning

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Bulk Disambiguation of Speculative Threads in Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
An Embedded Low Power/Cost 16-Bit Data/Instruction Microprocessor Compatible with ARM7 Software Tools

ASP-DAC '07 Proceedings of the 2007 Asia and South Pacific Design Automation Conference
Flexible Decoupled Transactional Memory Support

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Intel® atom™ processor core made FPGA-synthesizable

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Bitwidth cognizant architecture synthesis of custom hardware accelerators

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a unique hardware-software collaborative strategy to remove useless work at 16-bit data-width granularity. The underlying motivation is to design a low power execution platform by exploiting 'narrow' computations. The proposal uses a strictly narrow bit-wide microarchitecture (16-bit integer datapath), which realizes the goal of a low cost, low hardware complexity, low power execution engine. Software dynamically maps the 64-bit computations by translating them into an equivalent 16-bit instruction stream and optimizing them. In this paper, we propose an optimization technique, called Global Productiveness Propagation (GPP), which is a dynamic, profile-based optimization technique that infers the minimum required dataflow by pruning narrow computations that are most-probably useless (non-productive). More precisely, GPP speculatively prunes the static backward slices of selected narrow computations: computations that result in the same value (in their respective storage location) as that at the input of the region. This speculative optimization technique is formulated around the concept of 'narrow' computations because the same allow a finer granularity to distinguish between useful (productive) and useless (non-productive) work. GPP has been evaluated on an in-order narrow bit-wide execution core, achieving an average dynamic instruction stream reduction of 6.6%, while improving overall performance by 4.2%.