Analysis of branch prediction via data compression
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Calpa: a tool for automating selective dynamic compilation
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Power aware microarchitecture resource scaling
Proceedings of the conference on Design, automation and test in Europe
An Architectural Framework for Runtime Optimization
IEEE Transactions on Computers
Managing multi-configuration hardware via dynamic working set analysis
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Frequent value locality and its applications
ACM Transactions on Embedded Computing Systems (TECS)
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Code Specialization Based on Value Profiles
SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Vacuum packing: extracting hardware-detected program phases for post-link optimization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power-Aware Control Speculation through Selective Throttling
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Comparing Program Phase Detection Techniques
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
The Accuracy of Initial Prediction in Two-Phase Dynamic Binary Translators
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Exposing Memory Access Regularities Using Object-Relative Memory Profiling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A Formal Approach to Frequent Energy Adaptations for Multimedia Applications
Proceedings of the 31st annual international symposium on Computer architecture
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Performance directed energy management for main memory and disks
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Method-level phase behavior in java workloads
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The Fuzzy Correlation between Code and Performance Predictability
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Effective Adaptive Computing Environment Management via Dynamic Optimization
Proceedings of the international symposium on Code generation and optimization
Proceedings of the international symposium on Code generation and optimization
Reactive Techniques for Controlling Software Speculation
Proceedings of the international symposium on Code generation and optimization
Using Phase Behavior in Scientific Application to Guide Linux Operating System Customization
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Dynamic run-time architecture techniques for enabling continuous optimization
Proceedings of the 2nd conference on Computing frontiers
Visualization and analysis of phased behavior in Java programs
Proceedings of the 3rd international symposium on Principles and practice of programming in Java
A multinomial clustering model for fast simulation of computer architecture designs
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Dynamic detection and visualization of software phases
WODA '05 Proceedings of the third international workshop on Dynamic analysis
Performance directed energy management for main memory and disks
ACM Transactions on Storage (TOS)
Dynamic phase analysis for cycle-close trace generation
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Fast and fair: data-stream quality of service
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Dynamically configurable shared CMP helper engines for improved performance
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Gated memory control for memory monitoring, leak detection and garbage collection
Proceedings of the 2005 workshop on Memory system performance
Online Phase Detection Algorithms
Proceedings of the International Symposium on Code Generation and Optimization
Region Monitoring for Local Phase Detection in Dynamic Optimization Systems
Proceedings of the International Symposium on Code Generation and Optimization
Selecting Software Phase Markers with Code Structure Analysis
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic thread assignment on heterogeneous multiprocessor architectures
Proceedings of the 3rd conference on Computing frontiers
Evaluation of the field-programmable cache: performance and energy consumption
Proceedings of the 3rd conference on Computing frontiers
Fast thermal simulation for architecture level dynamic thermal management
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Software annotations for power optimization on mobile devices
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Phase-based visualization and analysis of Java programs
Science of Computer Programming - Special issue: Principles and practices of programming in Java (PPPJ 2004)
Efficient remote profiling for resource-constrained devices
ACM Transactions on Architecture and Code Optimization (TACO)
Decomposing memory performance: data structures and phases
Proceedings of the 5th international symposium on Memory management
Impact of virtual execution environments on processor energy consumption and hardware adaptation
Proceedings of the 2nd international conference on Virtual execution environments
Learning-Based SMT Processor Resource Distribution via Hill-Climbing
Proceedings of the 33rd annual international symposium on Computer Architecture
Wavelet-based phase classification
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Complexity-based program phase analysis and classification
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Temporal search: detecting hidden malware timebombs with virtual machines
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
HeapMD: identifying heap-based bugs using anomaly detection
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
B2Sim:: a fast micro-architecture simulator based on basic block characterization
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Improving the performance and power efficiency of shared helpers in CMPs
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Wide and efficient trace prediction using the local trace predictor
Proceedings of the 20th annual international conference on Supercomputing
Effective management of multiple configurable units using dynamic optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting program phase behavior for energy reduction on multi-configuration processors
Journal of Systems Architecture: the EUROMICRO Journal
ReCycle:: pipeline adaptation to tolerate process variation
Proceedings of the 34th annual international symposium on Computer architecture
Integrated CPU and l2 cache voltage scaling using machine learning
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Addressing instruction fetch bottlenecks by using an instruction register file
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
A one-shot configurable-cache tuner for improved energy and performance
Proceedings of the conference on Design, automation and test in Europe
Cross-component energy management: Joint adaptation of processor and memory
ACM Transactions on Architecture and Code Optimization (TACO)
A self-tuning configurable cache
Proceedings of the 44th annual Design Automation Conference
Analysis of input-dependent program behavior using active profiling
Proceedings of the 2007 workshop on Experimental computer science
Analysis of input-dependent program behavior using active profiling
ecs'07 Experimental computer science on Experimental computer science
Architectural contesting: exposing and exploiting temperamental behavior
ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Hardware counter driven on-the-fly request signatures
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Adapting to intermittent faults in multicore systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy
Proceedings of the 18th ACM Great Lakes symposium on VLSI
HMTT: a platform independent full-system memory trace monitoring system
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Performance modeling of parallel applications for grid scheduling
Journal of Parallel and Distributed Computing
Journal of Systems Architecture: the EUROMICRO Journal
Hill-climbing SMT processor resource distribution
ACM Transactions on Computer Systems (TOCS)
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Shapeshifter: Dynamically changing pipeline width and speed to address process variations
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
EVAL: Utilizing processors with variation-induced timing errors
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
HASS: a scheduler for heterogeneous multicore systems
ACM SIGOPS Operating Systems Review
Adapting application execution in CMPs using helper threads
Journal of Parallel and Distributed Computing
Phase detection using trace compilation
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Quantifying hardware counter sampling error in computer system workload characterization
Quantifying hardware counter sampling error in computer system workload characterization
A cross-layer approach to heterogeneity and reliability
MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Characterizing processor thermal behavior
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Performance characterization of data mining benchmarks
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
A self-adjusting code cache manager to balance start-up time and memory usage
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
A model to exploit power-performance efficiency in superscalar processors via structure resizing
Proceedings of the 20th symposium on Great lakes symposium on VLSI
A self-adaptive scheduler for asymmetric multi-cores
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Phase complexity surfaces: characterizing time-varying program behavior
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Program behavior prediction using a statistical metric model
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
On mitigating memory bandwidth contention through bandwidth-aware scheduling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An input-centric paradigm for program dynamic optimizations
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Program phase and runtime distribution-aware online DVFS for combined Vdd/Vbb scaling
Proceedings of the Conference on Design, Automation and Test in Europe
Dynamic program phase detection in distributed shared- memory multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Detecting phases in parallel applications on shared memory architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A Predictive Model for Dynamic Microarchitectural Adaptivity Control
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A workload-adaptive and reconfigurable bus architecture for multicore processors
International Journal of Reconfigurable Computing
BarrierWatch: characterizing multithreaded workloads across and within program-defined epochs
Proceedings of the 8th ACM International Conference on Computing Frontiers
Are hardware performance counters a cost effective way for integrity checking of programs
Proceedings of the sixth ACM workshop on Scalable trusted computing
Performance and power evaluation of an intelligently adaptive data cache
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Dynamic code region (DCR) based program phase tracking and prediction for dynamic optimizations
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Phase-Based miss rate prediction across program inputs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A detailed study on phase predictors
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Function units sharing between neighbor cores in CMP
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Offline phase analysis and optimization for multi-configuration processors
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Characterizing time-varying program behavior using phase complexity surfaces
Transactions on High-Performance Embedded Architectures and Compilers IV
Microvisor: a runtime architecture for thermal management in chip multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers IV
Finding extreme behaviors in microprocessor workloads
Transactions on High-Performance Embedded Architectures and Compilers IV
Predictive power management for multi-core processors
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
The HELIX project: overview and directions
Proceedings of the 49th Annual Design Automation Conference
Improving dynamic prediction accuracy through multi-level phase analysis
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Providing fairness on shared-memory multiprocessors via process scheduling
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Phase guided profiling for fast cache modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Exploiting inter-sequence correlations for program behavior prediction
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Microarchitectural design space exploration made fast
Microprocessors & Microsystems
Predicting Performance Impact of DVFS for Realistic Memory Systems
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
eDoctor: automatically diagnosing abnormal battery drain issues on smartphones
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Dynamic configuration prefetching based on piecewise linear prediction
Proceedings of the Conference on Design, Automation and Test in Europe
Imbalanced cache partitioning for balanced data-parallel programs
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Trace based phase prediction for tightly-coupled heterogeneous cores
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic microarchitectural adaptation using machine learning
ACM Transactions on Architecture and Code Optimization (TACO)
Fast and accurate thermal modeling and simulation of manycore processors and workloads
Microelectronics Journal
Hi-index | 0.00 |
In a single second a modern processor can execute billions of instructions. Obtaining a bird's eye view of the behavior of a program at these speeds can be a difficult task when all that is available is cycle by cycle examination. In many programs, behavior is anything but steady state, and understanding the patterns of behavior, at run-time, can unlock a multitude of optimization opportunities.In this paper, we present a unified profiling architecture that can efficiently capture, classify, and predict phase-based program behavior on the largest of time scales. By examining the proportion of instructions that were executed from different sections of code, we can find generic phases that correspond to changes in behavior across many metrics. By classifying phases generically, we avoid the need to identify phases for each optimization, and enable a unified prediction scheme that can forecast future behavior. Our analysis shows that our design can capture phases that account for over 80% of execution using less that 500 bytes of on-chip memory.