Increasing cache port efficiency for dynamic superscalar microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Power and performance evaluation of globally asynchronous locally synchronous processors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Managing power and performance for System-on-Chip designs using Voltage Islands
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Power efficiency of voltage scaling in multiple clock, multiple voltage cores
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Dynamic frequency and voltage control for a multiple clock domain microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A Low-Latency FIFO for Mixed-Clock Systems
WVLSI '00 Proceedings of the IEEE Computer Society Annual Workshop on VLSI (WVLSI'00)
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Application adaptive energy efficient clustered architectures
Proceedings of the 2004 international symposium on Low power electronics and design
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Hi-index | 0.00 |
As difficulty and the costs of distributing a single global clock throughout a processor is growing generation by generation, Globally-Asynchronous Locally-Synchronous (GALS) designs are an alternative approach to the conventional synchronous processors. In this paper, we propose Dynamic Instruction Cascading (DIC). DIC is a technique to execute two dependent instructions in one cycle by scaling down the clock frequency. Lowering the clock frequency enables the signal to reach farther, thereby computing two instructions in one cycle becomes possible. DIC is effectively applied to GALS processors because lowering only the clock frequency of the target domain is needed and therefore unwanted performance degradation will be prevented. The results showed average performance improvement of 7% on SPEC CPU2000 Integer and MediaBench applications when assuming that DIC is possible by lowering the clock frequency to 80%.