Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Power and performance evaluation of globally asynchronous locally synchronous processors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
A Low-Latency FIFO for Mixed-Clock Systems
WVLSI '00 Proceedings of the IEEE Computer Society Annual Workshop on VLSI (WVLSI'00)
Self-Calibrating Clocks for Globally Asynchronous Locally Synchronous Systems
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor
Proceedings of the 30th annual international symposium on Computer architecture
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
VSV: L2-Miss-Driven Variable Supply-Voltage Scaling for Low Power
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the conference on Design, automation and test in Europe
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
We propose an adaptive scalable architecture suitable for performing real-time algorithm-specific tasks. The architecture is based on Globally Asynchronous and Locally Synchronous (GALS) design paradigm. We demonstrate that for different real-time commercial applications with algorithm-specific jobs like online transaction processing, Fourier transform etc., the proposed architecture allows dynamic load-balancing and adaptive inter-task voltage scaling. The architecture can also detect process-shifts for the individual processing units and determine their appropriate operating conditions. Simulation results for two representative applications show that for a random job distribution, we obtain up to 67% improvement in MOPS/W (millions of operations per second per watt) over a fully synchronous implementation.