Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Efficient algorithm for graph-partitioning problem using a problem transformation method
Computer-Aided Design
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The performance impact of incomplete bypassing in processor pipelines
Proceedings of the 28th annual international symposium on Microarchitecture
Advanced compiler design and implementation
Advanced compiler design and implementation
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Instruction scheduling for clustered VLIW architectures
ISSS '00 Proceedings of the 13th international symposium on System synthesis
Exploring performance tradeoffs for clustered VLIW ASIPs
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Pipelining and Bypassing in a VLIW Processor
IEEE Transactions on Parallel and Distributed Systems
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Partitioned Schedules for Clustered VLIW Architectures
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Operation tables for scheduling in the presence of incomplete bypassing
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Retargetable pipeline hazard detection for partially bypassed processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
In this paper we describe a design exploration methodology for clustered VLIW architectures. The central idea of this work is a set of three techniques aimed at reducing the cost of expensive inter-cluster copy operations. Instruction scheduling is performed using a list-scheduling algorithm that stores operand chains into the same register file. Functional units are assigned to clusters based on the application inter-cluster communication pattern. Finally, a careful insertion of pipeline bypasses is used to increase the number of data-dependencies that can be satisfied by pipeline register operands. Experimental results, using the SPEC95 benchmark and the IMPACT compiler, reveal a substantial reduction in the number of copies between clusters.