DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
Dynamic history-length fitting: a third level of adaptivity for branch prediction
Proceedings of the 25th annual international symposium on Computer architecture
Dynamic IPC/clock rate optimization
Proceedings of the 25th annual international symposium on Computer architecture
Adapting cache line size to application behavior
ICS '99 Proceedings of the 13th international conference on Supercomputing
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
A framework for dynamic energy efficiency and temperature management
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Power and energy reduction via pipeline balancing
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
An Architectural Framework for Runtime Optimization
IEEE Transactions on Computers
Runtime Reconfiguration Techniques for Efficient General-Purpose Computation
IEEE Design & Test
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Adaptive Mode Control: A Static-Power-Efficient Cache Design
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Measurements of major locality phases in symbolic reference strings
SIGMETRICS '76 Proceedings of the 1976 ACM SIGMETRICS conference on Computer performance modeling measurement and evaluation
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Online feedback-directed optimization of Java
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Vacuum packing: extracting hardware-detected program phases for post-link optimization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic trace selection using performance monitoring hardware sampling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Positional adaptation of processors: application to energy reduction
Proceedings of the 30th annual international symposium on Computer architecture
Dynamically managing the communication-parallelism trade-off in future clustered processors
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
Comparing Program Phase Detection Techniques
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
LLVA: A Low-level Virtual Instruction Set Architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Exploiting Program Branch Probabilities in Hardware Compilation
IEEE Transactions on Computers
Method-level phase behavior in java workloads
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The Fuzzy Correlation between Code and Performance Predictability
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamically Trading Frequency for Complexity in a GALS Microprocessor
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Effective Adaptive Computing Environment Management via Dynamic Optimization
Proceedings of the international symposium on Code generation and optimization
Maintaining Consistency and Bounding Capacity of Software Code Caches
Proceedings of the international symposium on Code generation and optimization
Proceedings of the international symposium on Code generation and optimization
Reactive Techniques for Controlling Software Speculation
Proceedings of the international symposium on Code generation and optimization
Selective main memory compression by identifying program phase changes
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Visualization and analysis of phased behavior in Java programs
Proceedings of the 3rd international symposium on Principles and practice of programming in Java
Power prediction for intel XScale® processors using performance monitoring unit events
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Frequent Loop Detection Using Efficient Nonintrusive On-Chip Hardware
IEEE Transactions on Computers
Fast and fair: data-stream quality of service
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Locality analysis to control dynamically way-adaptable caches
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Gated memory control for memory monitoring, leak detection and garbage collection
Proceedings of the 2005 workshop on Memory system performance
Online Phase Detection Algorithms
Proceedings of the International Symposium on Code Generation and Optimization
Region Monitoring for Local Phase Detection in Dynamic Optimization Systems
Proceedings of the International Symposium on Code Generation and Optimization
Selecting Software Phase Markers with Code Structure Analysis
Proceedings of the International Symposium on Code Generation and Optimization
Evaluation of the field-programmable cache: performance and energy consumption
Proceedings of the 3rd conference on Computing frontiers
Phase-based visualization and analysis of Java programs
Science of Computer Programming - Special issue: Principles and practices of programming in Java (PPPJ 2004)
Efficient remote profiling for resource-constrained devices
ACM Transactions on Architecture and Code Optimization (TACO)
Signature-based workload estimation for mobile 3D graphics
Proceedings of the 43rd annual Design Automation Conference
Wavelet-based phase classification
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Complexity-based program phase analysis and classification
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Process variation aware cache leakage management
Proceedings of the 2006 international symposium on Low power electronics and design
Data prefetching in a cache hierarchy with high bandwidth and capacity
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Proceedings of the 20th annual international conference on Supercomputing
Effective management of multiple configurable units using dynamic optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting program phase behavior for energy reduction on multi-configuration processors
Journal of Systems Architecture: the EUROMICRO Journal
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
Cross-component energy management: Joint adaptation of processor and memory
ACM Transactions on Architecture and Code Optimization (TACO)
Analysis of input-dependent program behavior using active profiling
Proceedings of the 2007 workshop on Experimental computer science
Analysis of input-dependent program behavior using active profiling
ecs'07 Experimental computer science on Experimental computer science
Architectural contesting: exposing and exploiting temperamental behavior
ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Data prefetching in a cache hierarchy with high bandwidth and capacity
ACM SIGARCH Computer Architecture News
Efficiency trends and limits from comprehensive microarchitectural adaptivity
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Hardware counter driven on-the-fly request signatures
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Phase-based adaptive recompilation in a JVM
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Journal of Systems Architecture: the EUROMICRO Journal
Multi-optimization power management for chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Finding Stress Patterns in Microprocessor Workloads
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Communication Based Proactive Link Power Management
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Shapeshifter: Dynamically changing pipeline width and speed to address process variations
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
EVAL: Utilizing processors with variation-induced timing errors
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A Dynamic Control Mechanism for Pipeline Stage Unification by Identifying Program Phases
IEICE - Transactions on Information and Systems
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Phase detection using trace compilation
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
A hybrid local-global approach for multi-core thermal management
Proceedings of the 2009 International Conference on Computer-Aided Design
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
A self-adjusting code cache manager to balance start-up time and memory usage
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Program phase detection based dynamic control mechanisms for pipeline stage unification adoption
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
A model to exploit power-performance efficiency in superscalar processors via structure resizing
Proceedings of the 20th symposium on Great lakes symposium on VLSI
A self-adaptive scheduler for asymmetric multi-cores
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Phase complexity surfaces: characterizing time-varying program behavior
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Reducing splaying by taking advantage of working sets
WEA'08 Proceedings of the 7th international conference on Experimental algorithms
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Generalized ERSS tree model: Revisiting working sets
Performance Evaluation
A reconfigurable cache memory with heterogeneous banks
Proceedings of the Conference on Design, Automation and Test in Europe
Power and performance aware reconfigurable cache for CMPs
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Dynamic program phase detection in distributed shared- memory multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Detecting phases in parallel applications on shared memory architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A Predictive Model for Dynamic Microarchitectural Adaptivity Control
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Journal of Computer Science and Technology
Modeling program resource demand using inherent program characteristics
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Power efficient scheduling heuristics for energy conservation in computational grids
The Journal of Supercomputing
Modeling program resource demand using inherent program characteristics
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
BarrierWatch: characterizing multithreaded workloads across and within program-defined epochs
Proceedings of the 8th ACM International Conference on Computing Frontiers
The gradient-based cache partitioning algorithm
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Performance and power evaluation of an intelligently adaptive data cache
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Dynamic code region (DCR) based program phase tracking and prediction for dynamic optimizations
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Phase-Based miss rate prediction across program inputs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A detailed study on phase predictors
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Static techniques to improve power efficiency of branch predictors
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Offline phase analysis and optimization for multi-configuration processors
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Thread scheduling for heterogeneous multicore processors using phase identification
ACM SIGMETRICS Performance Evaluation Review
Characterizing time-varying program behavior using phase complexity surfaces
Transactions on High-Performance Embedded Architectures and Compilers IV
Microvisor: a runtime architecture for thermal management in chip multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers IV
Communication based proactive link power management
Transactions on High-Performance Embedded Architectures and Compilers IV
Finding extreme behaviors in microprocessor workloads
Transactions on High-Performance Embedded Architectures and Compilers IV
Improving dynamic prediction accuracy through multi-level phase analysis
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Phase guided profiling for fast cache modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Phase-based scheduling and thread migration for heterogeneous multicore processors
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
When less is more (LIMO):controlled parallelism forimproved efficiency
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
DNA-inspired scheme for building the energy profile of HPC systems
E2DC'12 Proceedings of the First international conference on Energy Efficient Data Centers
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic microarchitectural adaptation using machine learning
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
Microprocessors are designed to provide good average performance over a variety of workloads. This can lead to inefficiencies both in power and performance for individual programs and during individual phases within the same program. Microarchitectures with multi-configuration units (e.g. caches, predictors, instruction windows) are able to adapt dynamically to program behavior and enable/disable resources as needed. A key element of existing configuration algorithms is adjusting to program phase changes. This is typically done by "tuning" when a phase change is detected --- i.e. sequencing through a series of trial configurations and selecting the best.Algorithms that dynamically collect and analyze program working set information are studied. To make this practical, we propose working set signatures --- highly compressed working set representations (e.g. 32-128 bytes total). Algorithms use working set signatures to 1) detect working set changes and trigger re-tuning; 2) identify, recurring working sets and re-install saved optimal reconfigurations, thus avoiding the time-consuming tuning process; 3) estimate working set sizes to configure caches directly to the proper size, also avoiding the tuning process. Multi-configuration instruction caches are used to demonstrate the performance of the proposed algorithms. When applied to reconfigurable instruction caches, an algorithm that identifies recurring phases achieves power savings and performance similar to the best algorithm reported to date, but with orders-of-magnitude savings in the number of re-tunings.