Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Analysis of multilevel graph partitioning
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Effective cluster assignment for modulo scheduling
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Graph-partitioning based instruction scheduling for clustered processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Asymmetric-frequency clustering: a power-aware back-end for high-performance processors
Proceedings of the 2002 international symposium on Low power electronics and design
The TigerSHARC DSP Architecture
IEEE Micro
A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor
Proceedings of the 30th annual international symposium on Computer architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Swing Modulo Scheduling: A Lifetime-Sensitive Approach
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
A Distributed Control Path Architecture for VLIW Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
INTACTE: an interconnect area, delay, and energy estimation tool for microarchitectural explorations
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Hi-index | 0.00 |
Increasing performance, while at the same time reducing power consumption, is a major design tradeoff in current microprocessors. In this paper, we investigate the potential of using a heterogeneous clustered VLIW microarchitecture. In the proposed microarchitecture, each cluster, the interconnection network and the supporting memory hierarchy can run at different frequencies and voltages. Some of the clusters can then be configured to be performanceoriented and run at high frequency, while the other clusters can be configured to be low-power-oriented and run at lower frequencies, thus reducing overall consumption. For this heterogeneous design to be effective, we need to select the most suitable frequencies and voltages for each component. We propose a scheme to choose these parameters based on a model that estimates the energy consumption and the execution time of floating-point codes at compile time. Finally, we present a modulo scheduling technique based on graph partitioning that exploits the opportunities presented on heterogeneous clustered microarchitectures. Results show that the Energy-Delay2 product (ED2) can be significantly reduced by 15% on average for a microarchitecture with 4-clusters and by as much as 35% for selected programs.