Slack: maximizing performance under technological constraints
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Power and performance evaluation of globally asynchronous locally synchronous processors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Asymmetric-frequency clustering: a power-aware back-end for high-performance processors
Proceedings of the 2002 international symposium on Low power electronics and design
Control-theoretic dynamic frequency and voltage scaling for multimedia workloads
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Power efficiency of voltage scaling in multiple clock, multiple voltage cores
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Managing static leakage energy in microprocessor functional units
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic frequency and voltage control for a multiple clock domain microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
ASYNC '03 Proceedings of the 9th International Symposium on Asynchronous Circuits and Systems
Temperature-aware microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor
Proceedings of the 30th annual international symposium on Computer architecture
A critical analysis of application-adaptive multiple clock processors
Proceedings of the 2003 international symposium on Low power electronics and design
A mixed-clock issue queue design for globally asynchronous, locally synchronous processor cores
Proceedings of the 2003 international symposium on Low power electronics and design
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
VSV: L2-Miss-Driven Variable Supply-Voltage Scaling for Low Power
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A low power approach to system level pipelined interconnect design
Proceedings of the 2004 international workshop on System level interconnect prediction
Hybrid Architectural Dynamic Thermal Management
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Temperature-aware microarchitecture: Modeling and implementation
ACM Transactions on Architecture and Code Optimization (TACO)
Synchroscalar: A Multiple Clock Domain, Power-Aware, Tile-Based Embedded Processor
Proceedings of the 31st annual international symposium on Computer architecture
Application adaptive energy efficient clustered architectures
Proceedings of the 2004 international symposium on Low power electronics and design
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
Formal online methods for voltage/frequency control in multiple clock domain microprocessors
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
The Energy Impact of Aggressive Loop Fusion
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Dynamically Trading Frequency for Complexity in a GALS Microprocessor
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks
ISQED '05 Proceedings of the 6th International Symposium on Quality of Electronic Design
A flexible simulation framework for graphics architectures
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
GAARP: A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks
IEEE Transactions on Computers
Increased Scalability and Power Efficiency by Using Multiple Speed Pipelines
Proceedings of the 32nd annual international symposium on Computer Architecture
Coordinated, distributed, formal energy management of chip multiprocessors
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Combined circuit and architectural level variable supply-voltage scaling for low power
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Toward a multiple clock/voltage island design style for power-aware processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Power reduction techniques for microprocessor systems
ACM Computing Surveys (CSUR)
An indirect current sensing technique for IDDQ and IDDT tests
GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Evaluation of the field-programmable cache: performance and energy consumption
Proceedings of the 3rd conference on Computing frontiers
Independent front-end and back-end dynamic voltage scaling for a GALS microarchitecture
Proceedings of the 2006 international symposium on Low power electronics and design
Synergistic temperature and energy management in GALS processor architectures
Proceedings of the 2006 international symposium on Low power electronics and design
Hardware based frequency/voltage control of voltage frequency island systems
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Energy efficient prefetching and caching
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Heterogeneous Clustered VLIW Microarchitectures
Proceedings of the International Symposium on Code Generation and Optimization
Clock-frequency assignment for multiple clock domain systems-on-a-chip
Proceedings of the conference on Design, automation and test in Europe
A Survey and Taxonomy of GALS Design Styles
IEEE Design & Test
Performance Evaluation of Elastic GALS Interfaces and Network Fabric
Electronic Notes in Theoretical Computer Science (ENTCS)
Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP
Journal of Parallel and Distributed Computing
Compiler-directed frequency and voltage scaling for a multiple clock domain microarchitecture
Proceedings of the 5th conference on Computing frontiers
A scalable dual-clock FIFO for data transfers between arbitrary and haltable clock domains
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Shapeshifter: Dynamically changing pipeline width and speed to address process variations
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Improving SMT performance: an application of genetic algorithms to configure resizable caches
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Power management of voltage/frequency island-based systems using hardware-based methods
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A case for dynamic frequency tuning in on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic capacity-speed tradeoffs in SMT processor caches
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
The implementation and evaluation of a low-power clock distribution network based on EPIC
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Online energy-saving algorithm for sensor networks in dynamic changing environments
Journal of Embedded Computing
Energy Efficient Resource Management in Virtualized Cloud Data Centers
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Energy-efficient scheduling of real-time periodic tasks in multicore systems
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
CPM in CMPs: Coordinated Power Management in Chip-Multiprocessors
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Empowering a helper cluster through data-width aware instruction selection policies
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
TH-1: China's first petaflop supercomputer
Frontiers of Computer Science in China
An Analysis of Power Consumption Logs from a Monitored Grid Site
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
GALDS: a complete framework for designing multiclock ASICs and socs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
RAFT: A router architecture with frequency tuning for on-chip networks
Journal of Parallel and Distributed Computing
Repeater insertion in power-managed VLSI systems
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Low-energy GALS NoC with FIFO-Monitoring dynamic voltage scaling
Microelectronics Journal
High performance, energy efficiency, and scalability with GALS chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Parallelism orchestration using DoPE: the degree of parallelism executive
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Simulating a LAGS processor to consider variable latency on L1 D-Cache
Proceedings of the 2010 Summer Computer Simulation Conference
Adaptive energy-management features of the IBM POWER 7 chip
IBM Journal of Research and Development
Evaluation of dynamic voltage and frequency scaling for stream programs
Proceedings of the 8th ACM International Conference on Computing Frontiers
An energy-efficient heterogeneous CMP based on hybrid TFET-CMOS cores
Proceedings of the 48th Design Automation Conference
A phase adaptive cache hierarchy for SMT processors
Microprocessors & Microsystems
Performance and power evaluation of an intelligently adaptive data cache
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Dynamic instruction cascading on GALS microprocessors
PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Synchroscalar: initial lessons in power-aware design of a tile-based embedded architecture
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Dynamic processor throttling for power efficient computations
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Journal of Computer and System Sciences
Hi-index | 0.00 |
As clock frequency increases and feature size decreases, clock distribution and wire delays present a growing challenge to the designers of singly-clocked, globally synchronous systems. We describe an alternative approach, which we call a Multiple Clock Domain (MCD) processor, in which the chip is divided into several (coarse-grained) clock domains, within which independent voltage and frequency scaling can be performed. Boundaries between domains are chosen to exploit existing queues, thereby minimizing inter-domain synchronization costs. We propose four clock domains, corresponding to the front end (including L1 instruction cache), integer units, floating point units, and load-store units (including L1 data cache and L2 cache). We evaluate this design using a simulation infrastructure based on SimpleScalar and Wattch. In an attempt to quantify potential energy savings independent of any particular on-line control strategy, we use off-line analysis of traces from a single-speed run of each of our benchmark applications to identify profitable reconfiguration points for a subsequent dynamic scaling run. Dynamic runs incorporate a detailed model of inter-domain synchronization delays, with latencies for intra-domain scaling similar to the whole-chip scaling latencies of Intel XScale and Transmeta LongRun technologies. Using applications from the MediaBench, Olden, and SPEC2000 benchmark suites, we obtain an average energy-delay product improvement of 20% with MCD compared to a modest 3% savings from voltage scaling a single clock and voltage system.