Table size reduction for data value predictors by exploiting narrow width values
Proceedings of the 14th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Bidwidth analysis with application to silicon compilation
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
ACM Transactions on Computer Systems (TOCS)
Frequent value locality and value-centric data cache design
ACM SIGPLAN Notices
Very low power pipelines using significance compression
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A static power model for architects
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic zero compression for cache energy reduction
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Frequent value compression in data caches
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Precision and error analysis of MATLAB applications during automated hardware synthesis for FPGAs
Proceedings of the conference on Design, automation and test in Europe
Frequent value locality and value-centric data cache design
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Energy reduction in queues and stacks by adaptive bitwidth compression
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Run-time power estimation in high performance microprocessors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Energy: efficient instruction dispatch buffer design for superscalar processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
C Compiler Design for an Industrial Network Processor
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Hardware and Software Techniques for Controlling DRAM Power Modes
IEEE Transactions on Computers
Reducing power with dynamic critical path information
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Joint local and global hardware adaptations for energy
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Vectorizing Compiler for Multimedia Extensions
International Journal of Parallel Programming
HLSpower: Hybrid Statistical Modeling of the Superscalar Power-Performance Design Space
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Influence of Compiler Optimizations on Value Prediction
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Low-Cost Value Predictors Using Frequent Value Locality
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Energy-Efficient Design of the Reorder Buffer
PATMOS '02 Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation
Data Compression Transformations for Dynamically Allocated Data Structures
CC '02 Proceedings of the 11th International Conference on Compiler Construction
On Availability of Bit-Narrow Operations in General-Purpose Applications
FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
Execution Latency Reduction via Variable Latency Pipeline and Instruction Reuse
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Quantifying behavioral differences between multimedia and general-purpose workloads
Journal of Systems Architecture: the EUROMICRO Journal
Exploiting data-width locality to increase superscalar execution bandwidth
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power-Aware Control Speculation through Selective Throttling
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Deterministic Clock Gating for Microprocessor Power Reduction
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Partial Resolution in Data Value Predictors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Energy-efficient issue queue design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Access Pattern Restructuring for Memory Energy
IEEE Transactions on Parallel and Distributed Systems
Software-Controlled Operand-Gating
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Speculative software management of datapath-width for energy optimization
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A Formal Approach to Frequent Energy Adaptations for Multimedia Applications
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
Dynamic Functional Unit Assignment for Low Power
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic functional unit assignment for low power
The Journal of Supercomputing
An Algorithm for Trading Off Quantization Error with Hardware Resources for MATLAB-Based FPGA Design
IEEE Transactions on Computers
A segmented parallel-prefix VLSI circuit with small delays for small segments
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
An asymmetric clustered processor based on value content
Proceedings of the 19th annual international conference on Supercomputing
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Quality-driven design by bitwidth optimization for video applications
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Profiling over Adaptive Ranges
Proceedings of the International Symposium on Code Generation and Optimization
A case for asymmetric-cell cache memories
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A case for a complexity-effective, width-partitioned microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Offline compression for on-chip ram
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Cross-component energy management: Joint adaptation of processor and memory
ACM Transactions on Architecture and Code Optimization (TACO)
Formulating and implementing profiling over adaptive ranges
ACM Transactions on Architecture and Code Optimization (TACO)
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Asymmetrically banked value-aware register files for low-energy and high-performance
Microprocessors & Microsystems
Early detection and bypassing of trivial operations to improve energy efficiency of processors
Microprocessors & Microsystems
A Flexible Code Compression Scheme Using Partitioned Look-Up Tables
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Multiplication acceleration through twin precision
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Compiling for reconfigurable computing: A survey
ACM Computing Surveys (CSUR)
VAIL: variation-aware issue logic and performance binning for processor yield and profit improvement
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Exploiting narrow-width values for thermal-aware register file designs
Proceedings of the Conference on Design, Automation and Test in Europe
Characterization and exploitation of narrow-width loads: the narrow-width cache approach
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Making a case for a green500 list
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Empowering a helper cluster through data-width aware instruction selection policies
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
IEEE Transactions on Circuits and Systems Part I: Regular Papers - Special section on 2009 IEEE system-on-chip conference
On the exploitation of narrow-width values for improving register file reliability
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DCG: deterministic clock-gating for low-power microprocessor design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2002 international symposium on low-power electronics and design (ISLPED)
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
A framework for correction of multi-bit soft errors in L2 caches based on redundancy
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Value compression for efficient computation
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Power reduction of superscalar processor functional units by resizing adder-width
PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Exploring the potential of architecture-level power optimizations
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Bit-sliced datapath for energy-efficient high performance microprocessors
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Exploiting narrow values for energy efficiency in the register files of superscalar microprocessors
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Enhanced bitwidth-aware register allocation
CC'06 Proceedings of the 15th international conference on Compiler Construction
Exploiting narrow-width values for process variation-tolerant 3-D microprocessors
Proceedings of the 49th Annual Design Automation Conference
An asymmetric adaptive-precision energy-efficient 3DIC multiplier
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Multispeculative additive trees in high-level synthesis
Proceedings of the Conference on Design, Automation and Test in Europe
Low power aging-aware register file design by duty cycle balancing
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.01 |
In general-purpose microprocessors, recent trends have pushed towards 64-bit word widths, primarily to accommodate the large addressing needs of some programs. Many integer problems, however, rarely need the full 64-bit dynamic range these CPUs provide. In fact, another recent instruction set trend has been increased support for sub-word operations (that is, manipulating data in quantities less than the full word size). In particular, most major processor families have introduced "multimedia" instruction set extensions that operate in parallel on several sub-word quantities in the same ALU.This paper notes that across the SPECint95 benchmarks, over half of the integer operation executions require 16 bits or less. With this as motivation, our work proposes hardware mechanisms that dynamically recognize and capitalize on these "narrow-bitwidth" instances. Both optimizations require little additional hardware, and neither requires compiler support.The first, power-oriented, optimization reduces processor power consumption by using aggressive clock gating to turn off portions of integer arithmetic units that will be unnecessary for narrow bitwidth operations. This optimization results in an over 50% reduction in the integer unit's power consumption for the SPECint95 and MediaBench benchmark suites. The second optimization improves performance by merging together narrow integer operations and allowing them to share a single functional unit. Conceptually akin to a dynamic form of MMX, this optimization offers speedups of 4.3%-6.2% for SPECint95 and 8.0%-10.4% for MediaBench.