Tailoring pipeline bypassing and functional unit mapping to application in clustered VLIW architectures

Authors:
Marcio Buss;Rodolfo Azevedo;Paulo Centoducatte;Guido Araujo
Affiliations:
IC - UNICAMP, Campinas, SP, Brazil;IC - UNICAMP, Campinas, SP, Brazil;IC - UNICAMP, Campinas, SP, Brazil;IC - UNICAMP, Campinas, SP, Brazil
Venue:
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Year:
2001

Citing 11
Cited 4

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Efficient algorithm for graph-partitioning problem using a problem transformation method

Computer-Aided Design
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The performance impact of incomplete bypassing in processor pipelines

Proceedings of the 28th annual international symposium on Microarchitecture
Advanced compiler design and implementation

Advanced compiler design and implementation
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Instruction scheduling for clustered VLIW architectures

ISSS '00 Proceedings of the 13th international symposium on System synthesis
Exploring performance tradeoffs for clustered VLIW ASIPs

Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Pipelining and Bypassing in a VLIW Processor

IEEE Transactions on Parallel and Distributed Systems
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Partitioned Schedules for Clustered VLIW Architectures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium

FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Operation tables for scheduling in the presence of incomplete bypassing

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
PBExplore: A Framework for Compiler-in-the-Loop Exploration of Partial Bypassing in Embedded Processors

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Retargetable pipeline hazard detection for partially bypassed processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe a design exploration methodology for clustered VLIW architectures. The central idea of this work is a set of three techniques aimed at reducing the cost of expensive inter-cluster copy operations. Instruction scheduling is performed using a list-scheduling algorithm that stores operand chains into the same register file. Functional units are assigned to clusters based on the application inter-cluster communication pattern. Finally, a careful insertion of pipeline bypasses is used to increase the number of data-dependencies that can be satisfied by pipeline register operands. Experimental results, using the SPEC95 benchmark and the IMPACT compiler, reveal a substantial reduction in the number of copies between clusters.