Shared-port register file architecture for low-energy VLIW processors

Authors:
Neeraj Goel;Anshul Kumar;Preeti Ranjan Panda
Affiliations:
Department of Computer Science and Engineering, IIT Delhi, India;Department of Computer Science and Engineering, IIT Delhi, India;Department of Computer Science and Engineering, IIT Delhi, India
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2014

Citing 32
Cited 0

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Compiler code transformations for superscalar-based high performance systems

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
Register file port requirements of transport triggered architectures

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation

Proceedings of the 28th annual international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The energy complexity of register files

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
TTAs: missing the ILP complexity wall

Journal of Systems Architecture: the EUROMICRO Journal - Special double issue on microprocessor architecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures

Proceedings of the 27th annual international symposium on Computer architecture
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Concrete Math

Concrete Math
Reducing register ports for higher speed and lower energy

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports using delayed write-back queues and operand pre-fetch

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
TriMedia CPU64 Architecture

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Automatic Architectural Synthesis of VLIW and EPIC Processors

Proceedings of the 12th international symposium on System synthesis
FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Temperature-aware microarchitecture: Modeling and implementation

ACM Transactions on Architecture and Code Optimization (TACO)
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
PBExplore: A Framework for Compiler-in-the-Loop Exploration of Partial Bypassing in Embedded Processors

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Power Breakdown Analysis for a Heterogeneous NoC Platform Running a Video Application

ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
The TM3270 Media-Processor

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Bypass aware instruction scheduling for register file power reduction

Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Register port complexity reduction in wide-issue processors with selective instruction execution

Microprocessors & Microsystems
Power Reduction in VLIW Processor with Compiler Driven Bypass Network

VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
Exploiting virtual registers to reduce pressure on real registers

ACM Transactions on Architecture and Code Optimization (TACO)
Exploring the Limits of Port Reduction in Centralized Register Files

VLSID '09 Proceedings of the 2009 22nd International Conference on VLSI Design
Trimaran: an infrastructure for research in instruction-level parallelism

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Low-power data forwarding for VLIW embedded architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a reduced-port Register File (RF) architecture for reducing RF energy in a VLIW processor. With port reduction, RF ports need to be shared among Function Units (FUs), which may lead to access conflicts, and thus, reduced performance. Our solution includes (i) a carefully designed RF-FU interconnection network that permits port sharing with minimum conflicts and without any delay/energy overheads, and (ii) a novel scheduling and binding algorithm that reduces the performance penalty. With our solution, we observed as much as 83% RF energy savings with no more than a 10% loss in performance for a set of Mediabench and Mibench benchmarks.