Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling

Authors:
Greg Semeraro;Grigorios Magklis;Rajeev Balasubramonian;David H. Albonesi;Sandhya Dwarkadas;Michael L. Scott
Affiliations:
-;-;-;-;-;-
Venue:
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Year:
2002

Citing 0
Cited 76

Slack: maximizing performance under technological constraints

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Power and performance evaluation of globally asynchronous locally synchronous processors

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Asymmetric-frequency clustering: a power-aware back-end for high-performance processors

Proceedings of the 2002 international symposium on Low power electronics and design
Control-theoretic dynamic frequency and voltage scaling for multimedia workloads

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Power efficiency of voltage scaling in multiple clock, multiple voltage cores

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Managing static leakage energy in microprocessor functional units

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic frequency and voltage control for a multiple clock domain microarchitecture

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Asynchronous Datapath with Software-Controlled On-Chip Adaptive Voltage Scaling for Multirate Signal Processing Applications

ASYNC '03 Proceedings of the 9th International Symposium on Asynchronous Circuits and Systems
Temperature-aware microarchitecture

Proceedings of the 30th annual international symposium on Computer architecture
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor

Proceedings of the 30th annual international symposium on Computer architecture
A critical analysis of application-adaptive multiple clock processors

Proceedings of the 2003 international symposium on Low power electronics and design
A mixed-clock issue queue design for globally asynchronous, locally synchronous processor cores

Proceedings of the 2003 international symposium on Low power electronics and design
Using Interaction Costs for Microarchitectural Bottleneck Analysis

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
VSV: L2-Miss-Driven Variable Supply-Voltage Scaling for Low Power

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A low power approach to system level pipelined interconnect design

Proceedings of the 2004 international workshop on System level interconnect prediction
Hybrid Architectural Dynamic Thermal Management

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Temperature-aware microarchitecture: Modeling and implementation

ACM Transactions on Architecture and Code Optimization (TACO)
Synchroscalar: A Multiple Clock Domain, Power-Aware, Tile-Based Embedded Processor

Proceedings of the 31st annual international symposium on Computer architecture
Application adaptive energy efficient clustered architectures

Proceedings of the 2004 international symposium on Low power electronics and design
Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
Formal online methods for voltage/frequency control in multiple clock domain microprocessors

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
The Energy Impact of Aggressive Loop Fusion

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Dynamically Trading Frequency for Complexity in a GALS Microprocessor

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks

ISQED '05 Proceedings of the 6th International Symposium on Quality of Electronic Design
A flexible simulation framework for graphics architectures

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
GAARP: A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks

IEEE Transactions on Computers
Increased Scalability and Power Efficiency by Using Multiple Speed Pipelines

Proceedings of the 32nd annual international symposium on Computer Architecture
Coordinated, distributed, formal energy management of chip multiprocessors

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Combined circuit and architectural level variable supply-voltage scaling for low power

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Toward a multiple clock/voltage island design style for power-aware processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Power reduction techniques for microprocessor systems

ACM Computing Surveys (CSUR)
An indirect current sensing technique for IDDQ and IDDT tests

GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Evaluation of the field-programmable cache: performance and energy consumption

Proceedings of the 3rd conference on Computing frontiers
Independent front-end and back-end dynamic voltage scaling for a GALS microarchitecture

Proceedings of the 2006 international symposium on Low power electronics and design
Synergistic temperature and energy management in GALS processor architectures

Proceedings of the 2006 international symposium on Low power electronics and design
Hardware based frequency/voltage control of voltage frequency island systems

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Energy efficient prefetching and caching

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Heterogeneous Clustered VLIW Microarchitectures

Proceedings of the International Symposium on Code Generation and Optimization
Clock-frequency assignment for multiple clock domain systems-on-a-chip

Proceedings of the conference on Design, automation and test in Europe
A Survey and Taxonomy of GALS Design Styles

IEEE Design & Test
Performance Evaluation of Elastic GALS Interfaces and Network Fabric

Electronic Notes in Theoretical Computer Science (ENTCS)
Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP

Journal of Parallel and Distributed Computing
Compiler-directed frequency and voltage scaling for a multiple clock domain microarchitecture

Proceedings of the 5th conference on Computing frontiers
A scalable dual-clock FIFO for data transfers between arbitrary and haltable clock domains

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Shapeshifter: Dynamically changing pipeline width and speed to address process variations

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Improving SMT performance: an application of genetic algorithms to configure resizable caches

Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Power management of voltage/frequency island-based systems using hardware-based methods

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A case for dynamic frequency tuning in on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic capacity-speed tradeoffs in SMT processor caches

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
The implementation and evaluation of a low-power clock distribution network based on EPIC

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Online energy-saving algorithm for sensor networks in dynamic changing environments

Journal of Embedded Computing
Energy Efficient Resource Management in Virtualized Cloud Data Centers

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Energy-efficient scheduling of real-time periodic tasks in multicore systems

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Empowering a helper cluster through data-width aware instruction selection policies

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
TH-1: China's first petaflop supercomputer

Frontiers of Computer Science in China
An Analysis of Power Consumption Logs from a Monitored Grid Site

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
GALDS: a complete framework for designing multiclock ASICs and socs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
RAFT: A router architecture with frequency tuning for on-chip networks

Journal of Parallel and Distributed Computing
Repeater insertion in power-managed VLSI systems

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Low-energy GALS NoC with FIFO-Monitoring dynamic voltage scaling

Microelectronics Journal
High performance, energy efficiency, and scalability with GALS chip multiprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Parallelism orchestration using DoPE: the degree of parallelism executive

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Simulating a LAGS processor to consider variable latency on L1 D-Cache

Proceedings of the 2010 Summer Computer Simulation Conference
Adaptive energy-management features of the IBM POWER 7 chip

IBM Journal of Research and Development
Evaluation of dynamic voltage and frequency scaling for stream programs

Proceedings of the 8th ACM International Conference on Computing Frontiers
An energy-efficient heterogeneous CMP based on hybrid TFET-CMOS cores

Proceedings of the 48th Design Automation Conference
A phase adaptive cache hierarchy for SMT processors

Microprocessors & Microsystems
Performance and power evaluation of an intelligently adaptive data cache

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Dynamic instruction cascading on GALS microprocessors

PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Synchroscalar: initial lessons in power-aware design of a tile-based embedded architecture

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Dynamic processor throttling for power efficient computations

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Reliable energy-aware application mapping and voltage-frequency island partitioning for GALS-based NoC

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

As clock frequency increases and feature size decreases, clock distribution and wire delays present a growing challenge to the designers of singly-clocked, globally synchronous systems. We describe an alternative approach, which we call a Multiple Clock Domain (MCD) processor, in which the chip is divided into several (coarse-grained) clock domains, within which independent voltage and frequency scaling can be performed. Boundaries between domains are chosen to exploit existing queues, thereby minimizing inter-domain synchronization costs. We propose four clock domains, corresponding to the front end (including L1 instruction cache), integer units, floating point units, and load-store units (including L1 data cache and L2 cache). We evaluate this design using a simulation infrastructure based on SimpleScalar and Wattch. In an attempt to quantify potential energy savings independent of any particular on-line control strategy, we use off-line analysis of traces from a single-speed run of each of our benchmark applications to identify profitable reconfiguration points for a subsequent dynamic scaling run. Dynamic runs incorporate a detailed model of inter-domain synchronization delays, with latencies for intra-domain scaling similar to the whole-chip scaling latencies of Intel XScale and Transmeta LongRun technologies. Using applications from the MediaBench, Olden, and SPEC2000 benchmark suites, we obtain an average energy-delay product improvement of 20% with MCD compared to a modest 3% savings from voltage scaling a single clock and voltage system.