Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

Authors:
Andrea Capitanio;Nikil Dutt;Alexandru Nicolau
Affiliations:
-;-;-
Venue:
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Year:
1992

Citing 6
Cited 64

A Mapping Strategy for Parallel Processing

IEEE Transactions on Computers
A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Efficient algorithm for graph-partitioning problem using a problem transformation method

Computer-Aided Design
A three-port/three-access register file for concurrent processing and I/O communication in a RISC-like graphics engine

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
A linear-time heuristic for improving network partitions

DAC '82 Proceedings of the 19th Design Automation Conference
Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific)

Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific)

Register file port requirements of transport triggered architectures

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The performance impact of incomplete bypassing in processor pipelines

Proceedings of the 28th annual international symposium on Microarchitecture
Exploiting short-lived variables in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Partitioned register file for TTAs

Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Custom-fit processors: letting applications define architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting idle floating-point resources for integer execution

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Quantitative Evaluation of Register Pressure on Software Pipelined Loops

International Journal of Parallel Programming
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Widening resources: a cost-effective technique for aggressive ILP architectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Communication scheduling

ACM SIGPLAN Notices
Modulo scheduling for a fully-distributed clustered VLIW architecture

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Communication scheduling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
High-quality operation binding for clustered VLIW datapaths

Proceedings of the 38th annual Design Automation Conference
Loop Transformations for Architectures with Partitioned Register Banks

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Instruction scheduling for clustered VLIW architectures

ISSS '00 Proceedings of the 13th international symposium on System synthesis
Tailoring pipeline bypassing and functional unit mapping to application in clustered VLIW architectures

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures

IEEE Transactions on Computers
Affinity-based cluster assignment for unrolled loops

ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cluster assignment for high-performance embedded VLIW processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimizing Loop Performance for Clustered VLIW Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Register File Architecture and Compilation Scheme for Clustered ILP Processors

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Convergent scheduling

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports for higher speed and lower energy

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Non-Consistent Dual Register Files to Reduce Register Pressure

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Dynamically managing the communication-parallelism trade-off in future clustered processors

Proceedings of the 30th annual international symposium on Computer architecture
Partitioned Schedules for Clustered VLIW Architectures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A scalable wide-issue clustered VLIW with a reconfigurable interconnect

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Reducing register pressure through LAER algorithm

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Instruction buffering exploration for low energy VLIWs with instruction clusters

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
An analysis of a resource efficient checkpoint architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
Compiler-directed Data Partitioning for Multicluster Processors

Proceedings of the International Symposium on Code Generation and Optimization
Virtual Cluster Scheduling Through the Scheduling Graph

Proceedings of the International Symposium on Code Generation and Optimization
Heterogeneous Clustered VLIW Microarchitectures

Proceedings of the International Symposium on Code Generation and Optimization
Rapid VLIW processor customization for signal processing applications using combinational hardware functions

EURASIP Journal on Applied Signal Processing
Building a large instruction window through ROB compression

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Communication optimizations for global multi-threaded instruction scheduling

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Effective Code Generation for Distributed and Ping-Pong Register Files: A Case Study on PAC VLIW DSP Cores

Journal of Signal Processing Systems
Asymmetrically banked value-aware register files for low-energy and high-performance

Microprocessors & Microsystems
Reducing complexity of multiobjective design space exploration in VLIW-based embedded systems

ACM Transactions on Architecture and Code Optimization (TACO)
A Multi-Shared Register File Structure for VLIW Processors

Journal of Signal Processing Systems
Data pipeline optimization for shared memory multiple-SIMD architecture

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Mt-ADRES: multithreading on coarse-grained reconfigurable architecture

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
Exploiting narrow-width values for thermal-aware register file designs

Proceedings of the Conference on Design, Automation and Test in Europe
CROB: implementing a large instruction window through compression

Transactions on high-performance embedded architectures and compilers III
Integrating a new cluster assignment and scheduling algorithm into an experimental retargetable code generation framework

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
SIMD defragmenter: efficient ILP realization on data-parallel architectures

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Compiler-assisted energy optimization for clustered VLIW processors

Journal of Parallel and Distributed Computing
Compiler supports for VLIW DSP processors with SIMD intrinsics

Concurrency and Computation: Practice & Experience
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Shared-port register file architecture for low-energy VLIW processors

ACM Transactions on Architecture and Code Optimization (TACO)
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

Quantified Score

Visualization

Abstract