Trace selection for compiling large C application programs to microcode
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Improving the accuracy of static branch prediction using branch correlation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The Coign automatic distributed partitioning system
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Focusing processor policies via critical-path prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Slack: maximizing performance under technological constraints
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Power and performance evaluation of globally asynchronous locally synchronous processors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic frequency and voltage control for a multiple clock domain microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Interfacing Synchronous and Asynchronous Modules Within a High-Speed Pipeline
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Globally-asynchronous locally-synchronous systems (performance, reliability, digital)
Globally-asynchronous locally-synchronous systems (performance, reliability, digital)
ATOM: a flexible interface for building high performance program analysis tools
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Positional adaptation of processors: application to energy reduction
Proceedings of the 30th annual international symposium on Computer architecture
VSV: L2-Miss-Driven Variable Supply-Voltage Scaling for Low Power
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Profile-based adaptation for cache decay
ACM Transactions on Architecture and Code Optimization (TACO)
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Formal online methods for voltage/frequency control in multiple clock domain microprocessors
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
The Energy Impact of Aggressive Loop Fusion
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Dynamically Trading Frequency for Complexity in a GALS Microprocessor
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks
ISQED '05 Proceedings of the 6th International Symposium on Quality of Electronic Design
GAARP: A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks
IEEE Transactions on Computers
Coordinated, distributed, formal energy management of chip multiprocessors
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Combined circuit and architectural level variable supply-voltage scaling for low power
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hibernator: helping disk arrays sleep through the winter
Proceedings of the twentieth ACM symposium on Operating systems principles
Power reduction techniques for microprocessor systems
ACM Computing Surveys (CSUR)
Gated memory control for memory monitoring, leak detection and garbage collection
Proceedings of the 2005 workshop on Memory system performance
Evaluation of the field-programmable cache: performance and energy consumption
Proceedings of the 3rd conference on Computing frontiers
Program-level adaptive memory management
Proceedings of the 5th international symposium on Memory management
Energy-efficient instruction scheduling utilizing cache miss information
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS
ACM SIGARCH Computer Architecture News
Heterogeneous Clustered VLIW Microarchitectures
Proceedings of the International Symposium on Code Generation and Optimization
Integrated CPU and l2 cache voltage scaling using machine learning
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
Analysis of input-dependent program behavior using active profiling
Proceedings of the 2007 workshop on Experimental computer science
Analysis of input-dependent program behavior using active profiling
ecs'07 Experimental computer science on Experimental computer science
Compiler-directed frequency and voltage scaling for a multiple clock domain microarchitecture
Proceedings of the 5th conference on Computing frontiers
Energy-efficient register caching with compiler assistance
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic capacity-speed tradeoffs in SMT processor caches
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Compiler-directed dynamic voltage scaling using program phases
HiPC'07 Proceedings of the 14th international conference on High performance computing
Integrated CPU cache power management in multiple clock domain processors
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Applying statistical machine learning to multicore voltage & frequency scaling
Proceedings of the 7th ACM international conference on Computing frontiers
Interval-based models for run-time DVFS orchestration in superscalar processors
Proceedings of the 7th ACM international conference on Computing frontiers
The implementation and evaluation of a low-power clock distribution network based on EPIC
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Evaluating iterative optimization across 1000 datasets
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Energy-aware routing in data center network
Proceedings of the first ACM SIGCOMM workshop on Green networking
Temperature-aware task scheduling algorithm for soft real-time multi-core systems
Journal of Systems and Software
Program phase detection and exploitation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Task Allocation and Migration Algorithm for Temperature-Constrained Real-Time Multi-Core Systems
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
GALDS: a complete framework for designing multiclock ASICs and socs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
BarrierWatch: characterizing multithreaded workloads across and within program-defined epochs
Proceedings of the 8th ACM International Conference on Computing Frontiers
A phase adaptive cache hierarchy for SMT processors
Microprocessors & Microsystems
TSIC: thermal scheduling simulator for chip multiprocessors
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Performance and power evaluation of an intelligently adaptive data cache
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Dynamic code region (DCR) based program phase tracking and prediction for dynamic optimizations
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Identifying the optimal energy-efficient operating points of parallel workloads
Proceedings of the International Conference on Computer-Aided Design
Phase-Based miss rate prediction across program inputs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Pack & Cap: adaptive DVFS and thread packing under power caps
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic processor throttling for power efficient computations
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Compiler directed issue queue energy reduction
Transactions on High-Performance Embedded Architectures and Compilers IV
Deconstructing iterative optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Evaluation of Low-Power Computing when Operating on Subsets of Multicore Processors
Journal of Signal Processing Systems
Greening data center networks with throughput-guaranteed power-aware routing
Computer Networks: The International Journal of Computer and Telecommunications Networking
Energy-efficient work-stealing language runtimes
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.01 |
A Multiple Clock Domain (MCD) processor addresses the challenges of clock distribution and power dissipation by dividing a chip into several (coarse-grained) clock domains, allowing frequency and voltage to be reduced in domains that are not currently on the application's critical path. Given a reconfiguration mechanism capable of choosing appropriate times and values for voltage/frequency scaling, an MCD processor has the potential to achieve significant energy savings with low performance degradation.Early work on MCD processors evaluated the potential for energy savings by manually inserting reconfiguration instructions into applications, or by employing an oracle driven by off-line analysis of (identical) prior program runs. Subsequent work developed a hardware-based on-line mechanism that averages 75--85% of the energy-delay improvement achieved via off-line analysis.In this paper we consider the automatic insertion of reconfiguration instructions into applications, using profile-driven binary rewriting. Profile-based reconfiguration introduces the need for "training runs" prior to production use of a given application, but avoids the hardware complexity of on-line reconfiguration. It also has the potential to yield significantly greater energy savings. Experimental results (training on small data sets and then running on larger, alternative data sets) indicate that the profile-driven approach is more stable than hardware-based reconfiguration, and yields virtually all of the energy-delay improvement achieved via off-line analysis.