ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Energy-driven integrated hardware-software optimizations using SimplePower
Proceedings of the 27th annual international symposium on Computer architecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Bidwidth analysis with application to silicon compilation
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Very low power pipelines using significance compression
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Power and energy reduction via pipeline balancing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Exploiting data-width locality to increase superscalar execution bandwidth
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Software-Controlled Operand-Gating
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
An innovative low-power high-performance programmable signal processor for digital communications
IBM Journal of Research and Development
SWARP: a retargetable preprocessor for multimedia instructions: Research Articles
Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Bitwidth cognizant architecture synthesis of custom hardware accelerators
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A case for a complexity-effective, width-partitioned microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping
Proceedings of the International Symposium on Code Generation and Optimization
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Bit-sliced datapath for energy-efficient high performance microprocessors
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Hi-index | 0.00 |
This paper evaluates managing the processor's datapath-width at the compiler level by means of exploiting dynamic narrow-width operands. We capitalize on the large occurrence of these operands in multimedia programs to build static narrow-width regions that may be directly exposed to the compiler. We propose to augment the ISA with instructions directly exposing the datapath and the register widths to the compiler. Simple exception management allows this exposition to be only speculative. In this way, we permit the software to speculatively accommodate the execution of a program on a narrower datapath-width in order to save energy. For this purpose, we introduce a novel register file organization, the byte-slice register file, which allows the width of the register file to be dynamically reconfigured, providing both static and dynamic energy savings. We show that by combining the advantages of the byte-slice register file with the advantages provided by clock-gating the datapath on a per-region basis, up to 17% of the datapath dynamic energy can be saved, while a 22% reduction of the register file static energy is achieved.