Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Compiler code transformations for superscalar-based high performance systems
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
Register file port requirements of transport triggered architectures
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation
Proceedings of the 28th annual international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The energy complexity of register files
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
TTAs: missing the ILP complexity wall
Journal of Systems Architecture: the EUROMICRO Journal - Special double issue on microprocessor architecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Concrete Math
Reducing register ports for higher speed and lower energy
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports using delayed write-back queues and operand pre-fetch
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Automatic Architectural Synthesis of VLIW and EPIC Processors
Proceedings of the 12th international symposium on System synthesis
FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Temperature-aware microarchitecture: Modeling and implementation
ACM Transactions on Architecture and Code Optimization (TACO)
Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Power Breakdown Analysis for a Heterogeneous NoC Platform Running a Video Application
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Bypass aware instruction scheduling for register file power reduction
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Register port complexity reduction in wide-issue processors with selective instruction execution
Microprocessors & Microsystems
Power Reduction in VLIW Processor with Compiler Driven Bypass Network
VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
Exploiting virtual registers to reduce pressure on real registers
ACM Transactions on Architecture and Code Optimization (TACO)
Exploring the Limits of Port Reduction in Centralized Register Files
VLSID '09 Proceedings of the 2009 22nd International Conference on VLSI Design
Trimaran: an infrastructure for research in instruction-level parallelism
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Low-power data forwarding for VLIW embedded architectures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
We propose a reduced-port Register File (RF) architecture for reducing RF energy in a VLIW processor. With port reduction, RF ports need to be shared among Function Units (FUs), which may lead to access conflicts, and thus, reduced performance. Our solution includes (i) a carefully designed RF-FU interconnection network that permits port sharing with minimum conflicts and without any delay/energy overheads, and (ii) a novel scheduling and binding algorithm that reduces the performance penalty. With our solution, we observed as much as 83% RF energy savings with no more than a 10% loss in performance for a set of Mediabench and Mibench benchmarks.