Automatically characterizing large scale program behavior

Authors:
Timothy Sherwood;Erez Perelman;Greg Hamerly;Brad Calder
Affiliations:
University of California, San Diego;University of California, San Diego;University of California, San Diego;University of California, San Diego
Venue:
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Year:
2002

Citing 14
Cited 449

Robust Clustering with Applications in Computer Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Data clustering: a review

ACM Computing Surveys (CSUR)
HLS: combining statistical and symbolic simulation to guide microprocessor designs

Proceedings of the 27th annual international symposium on Computer architecture
Choosing representative slices of program execution for microarchitecture simulations: a preliminary application to the data stream

Workload characterization of emerging computer applications
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Reducing State Loss For Effective Trace Sampling of Superscalar Processors

ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Modeling Superscalar Processors via Statistical Simulation

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Experiments with Random Projection

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation

Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation

Pointer cache assisted prefetching

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic trace selection using performance monitoring hardware sampling

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Phi-Predication for light-weight if-conversion

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A framework for modeling and optimization of prescient instruction prefetch

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Run-time modeling and estimation of operating system power consumption

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using SimPoint for accurate and efficient simulation

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Predicting whole-program locality through reuse distance analysis

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Predicate prediction for efficient out-of-order execution

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Catching Accurate Profiles in Hardware

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Proceedings of the 30th annual international symposium on Computer architecture
Positional adaptation of processors: application to energy reduction

Proceedings of the 30th annual international symposium on Computer architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay

Proceedings of the 30th annual international symposium on Computer architecture
Phase tracking and prediction

Proceedings of the 30th annual international symposium on Computer architecture
Reducing power density through activity migration

Proceedings of the 2003 international symposium on Low power electronics and design
Challenges in Computer Architecture Evaluation

Computer
Comparing Program Phase Detection Techniques

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
VHC: Quickly Building an Optimizer for Complex Embedded Architectures

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP

ACM Transactions on Architecture and Code Optimization (TACO)
Circuit and microarchitectural techniques for reducing cache leakage power

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Locality-Based Online Trace Compression

IEEE Transactions on Computers
Circuit-aware architectural simulation

Proceedings of the 41st annual Design Automation Conference
EXPERT: expedited simulation exploiting program behavior repetition

Proceedings of the 18th annual international conference on Supercomputing
Adaptive Cache Compression for High-Performance Processors

Proceedings of the 31st annual international symposium on Computer architecture
Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor

Proceedings of the 31st annual international symposium on Computer architecture
Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies

Proceedings of the 31st annual international symposium on Computer architecture
Efficient simulation of trace samples on parallel machines

Parallel Computing
A low-power in-order/out-of-order issue queue

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing pipeline energy demands with local DVS and dynamic retiming

Proceedings of the 2004 international symposium on Low power electronics and design
Profile-based adaptation for cache decay

ACM Transactions on Architecture and Code Optimization (TACO)
HIDE: an infrastructure for efficiently protecting information leakage on the address bus

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Fingerprinting: bounding soft-error detection latency and bandwidth

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Architectural Support for Enhanced SMT Job Scheduling

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Method-level phase behavior in java workloads

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Automatic Synthesis of High-Speed Processor Simulators

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Control Flow Optimization Via Dynamic Reconvergence Prediction

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Hardware and Binary Modification Support for Code Pointer Protection From Buffer Overflow

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Razor: Circuit-Level Correction of Timing Errors for Low-Power Operation

IEEE Micro
Fingerprinting: Bounding Soft-Error-Detection Latency and Bandwidth

IEEE Micro
Maintaining Consistency and Bounding Capacity of Software Code Caches

Proceedings of the international symposium on Code generation and optimization
Phase-Aware Remote Profiling

Proceedings of the international symposium on Code generation and optimization
Reactive Techniques for Controlling Software Speculation

Proceedings of the international symposium on Code generation and optimization
DVS for On-Chip Bus Designs Based on Timing Error Correction

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Automatic Construction and Evaluation of Performance Skeletons

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Using Phase Behavior in Scientific Application to Guide Linux Operating System Customization

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Toward an Evaluation Infrastructure for Power and Energy Optimizations

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
How to use SimPoint to pick simulation points

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
A compressed memory hierarchy using an indirect index cache

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Accelerated warmup for sampled microarchitecture simulation

ACM Transactions on Architecture and Code Optimization (TACO)
Owl: next generation system monitoring

Proceedings of the 2nd conference on Computing frontiers
Controlling leakage power with the replacement policy in slumberous caches

Proceedings of the 2nd conference on Computing frontiers
Scheduling for heterogeneous processors in server systems

Proceedings of the 2nd conference on Computing frontiers
On the energy-efficiency of speculative hardware

Proceedings of the 2nd conference on Computing frontiers
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Replicating memory behavior for performance prediction

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
High Efficiency Counter Mode Security Architecture via Prediction and Precomputation

Proceedings of the 32nd annual international symposium on Computer Architecture
A Robust Main-Memory Compression Scheme

Proceedings of the 32nd annual international symposium on Computer Architecture
Rescue: A Microarchitecture for Testability and Defect Tolerance

Proceedings of the 32nd annual international symposium on Computer Architecture
Opportunistic Transient-Fault Detection

Proceedings of the 32nd annual international symposium on Computer Architecture
Piecewise Linear Branch Prediction

Proceedings of the 32nd annual international symposium on Computer Architecture
Store Buffer Design in First-Level Multibanked Data Caches

Proceedings of the 32nd annual international symposium on Computer Architecture
Computing Architectural Vulnerability Factors for Address-Based Structures

Proceedings of the 32nd annual international symposium on Computer Architecture
Visualization and analysis of phased behavior in Java programs

Proceedings of the 3rd international symposium on Principles and practice of programming in Java
Understanding the energy efficiency of SMT and CMP with multiclustering

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
A multinomial clustering model for fast simulation of computer architecture designs

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Dynamic detection and visualization of software phases

WODA '05 Proceedings of the third international workshop on Dynamic analysis
Dynamic phase analysis for cycle-close trace generation

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Fast and fair: data-stream quality of service

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Online performance analysis by statistical sampling of microprocessor performance counters

Proceedings of the 19th annual international conference on Supercomputing
Improved automatic testcase synthesis for performance model validation

Proceedings of the 19th annual international conference on Supercomputing
The implications of working set analysis on supercomputing memory hierarchy design

Proceedings of the 19th annual international conference on Supercomputing
Exploring the limits of leakage power reduction in caches

ACM Transactions on Architecture and Code Optimization (TACO)
Merging path and gshare indexing in perceptron branch prediction

ACM Transactions on Architecture and Code Optimization (TACO)
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Reducing the Latency and Area Cost of Core Swapping through Shared Helper Engines

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Load-Store Queue Management: an Energy-Efficient Design Based on a State-Filtering Mechanism.

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
How to Fake 1000 Registers

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Reducing Instruction Fetch Cost by Packing Instructions into RegisterWindows

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Improving Region Selection in Dynamic Optimization Systems

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Address-Value Delta (AVD) Prediction: Increasing the Effectiveness of Runahead Execution by Exploiting Regular Memory Allocation Patterns

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Long-Term Workload Phases: Duration Predictions and Applications to DVFS

IEEE Micro
Autonomic Microprocessor Execution via Self-Repairing Arrays

IEEE Transactions on Dependable and Secure Computing
DBmbench: fast and accurate database workload representation on modern microarchitecture

CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
Improving memory system performance with energy-efficient value speculation

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Bridging the Processor-Memory Performance Gapwith 3D IC Technology

IEEE Design & Test
Optimal sample length for efficient cache simulation

Journal of Systems Architecture: the EUROMICRO Journal
On the importance of optimizing the configuration of stream prefetchers

Proceedings of the 2005 workshop on Memory system performance
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations

IEEE Transactions on Computers
Revised Stride Data Value Predictor Design

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Design, implementation, and verification of active cache emulator (ACE)

Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Placement for configurable dataflow architecture

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Microarchitecture evaluation with floorplanning and interconnect pipelining

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Online Phase Detection Algorithms

Proceedings of the International Symposium on Code Generation and Optimization
Region Monitoring for Local Phase Detection in Dynamic Optimization Systems

Proceedings of the International Symposium on Code Generation and Optimization
Selecting Software Phase Markers with Code Structure Analysis

Proceedings of the International Symposium on Code Generation and Optimization
2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set

Proceedings of the International Symposium on Code Generation and Optimization
Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The Camino Compiler infrastructure

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Low cost trace-driven memory simulation using SimPoint

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Speculative early register release

Proceedings of the 3rd conference on Computing frontiers
Evaluation of the field-programmable cache: performance and energy consumption

Proceedings of the 3rd conference on Computing frontiers
Exploiting data-dependent slack using dynamic multi-VDD to minimize energy consumption in datapath circuits

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Phase-based visualization and analysis of Java programs

Science of Computer Programming - Special issue: Principles and practices of programming in Java (PPPJ 2004)
Efficient remote profiling for resource-constrained devices

ACM Transactions on Architecture and Code Optimization (TACO)
Decomposing memory performance: data structures and phases

Proceedings of the 5th international symposium on Memory management
Techniques for Multicore Thermal Management: Classification and New Exploration

Proceedings of the 33rd annual international symposium on Computer Architecture
Learning-Based SMT Processor Resource Distribution via Hill-Climbing

Proceedings of the 33rd annual international symposium on Computer Architecture
Measuring Benchmark Similarity Using Inherent Program Characteristics

IEEE Transactions on Computers
Instruction packing: Toward fast and energy-efficient instruction scheduling

ACM Transactions on Architecture and Code Optimization (TACO)
Automatic logging of operating system effects to guide application-level architecture simulation

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Applying architectural vulnerability Analysis to hard faults in the microprocessor

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
SMA: a self-monitored adaptive cache warm-up scheme for microprocessor simulation

International Journal of Parallel Programming
A systematic method for functional unit power estimation in microprocessors

Proceedings of the 43rd annual Design Automation Conference
Statistical sampling of microarchitecture simulation

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Design and analysis of spatial encoding circuits for peak power reduction in on-chip buses

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Extracting and improving microarchitecture performance on reconfigurable architectures

International Journal of Parallel Programming - Special issue: The next generation software program
Core architecture optimization for heterogeneous chip multiprocessors

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Wavelet-based phase classification

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Complexity-based program phase analysis and classification

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Performance prediction based on inherent program similarity

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Adaptive reorder buffers for SMT processors

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Long-latency branches: how much do they matter?

ACM SIGARCH Computer Architecture News
IPC Considered Harmful for Multiprocessor Workloads

IEEE Micro
Efficient Sampling Startup for SimPoint

IEEE Micro
Register file caching for energy efficiency

Proceedings of the 2006 international symposium on Low power electronics and design
Power efficiency for variation-tolerant multicore processors

Proceedings of the 2006 international symposium on Low power electronics and design
Data prefetching in a cache hierarchy with high bandwidth and capacity

MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
SlicK: slice-based locality exploitation for efficient redundant multithreading

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Accurate and efficient regression modeling for microarchitectural performance and power prediction

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Efficiently exploring architectural design spaces via predictive modeling

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
HeapMD: identifying heap-based bugs using anomaly detection

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Automatic performance model construction for the fast software exploration of new hardware designs

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Yet shorter warmup by combining no-state-loss and MRRL for sampled LRU cache simulation

Journal of Systems and Software - Special issue: Quality software
The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools

Proceedings of the 20th annual international conference on Supercomputing
Accurate memory data flow modeling in statistical simulation

Proceedings of the 20th annual international conference on Supercomputing
A scalable low power issue queue for large instruction window processors

Proceedings of the 20th annual international conference on Supercomputing
The Future of Simulation: A Field of Dreams

Computer
Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses

IEEE Transactions on Computers
Future execution: A prefetching mechanism that uses multiple cores to speed up single threads

ACM Transactions on Architecture and Code Optimization (TACO)
Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
ZettaRAM: A Power-Scalable DRAM Alternative through Charge-Voltage Decoupling

IEEE Transactions on Computers
Exploiting Operand Availability for Efficient Simultaneous Multithreading

IEEE Transactions on Computers
Authentication Control Point and Its Implications For Secure Processor Design

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
DMDC: Delayed Memory Dependence Checking through Age-Based Filtering

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Mitigating the Impact of Process Variations on Processor Register Files and Execution Units

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
NSL-BLRL: Efficient CacheWarmup for Sampled Processor Simulation

ANSS '06 Proceedings of the 39th annual Symposium on Simulation
Design and Implementation of aWorkload Specific Simulator

ANSS '06 Proceedings of the 39th annual Symposium on Simulation
M-TREE: a high efficiency security architecture for protecting integrity and privacy of software

Journal of Parallel and Distributed Computing - Special issue: Security in grid and distributed systems
A comparison of online and offline strategies for program adaptation

ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Analysis of hardware prefetching across virtual page boundaries

Proceedings of the 4th international conference on Computing frontiers
Unified microprocessor core storage

Proceedings of the 4th international conference on Computing frontiers
Accelerating memory decryption and authentication with frequent value prediction

Proceedings of the 4th international conference on Computing frontiers
By-passing the out-of-order execution pipeline to increase energy-efficiency

Proceedings of the 4th international conference on Computing frontiers
Computational and storage power optimizations for the O-GEHL branch predictor

Proceedings of the 4th international conference on Computing frontiers
Speculative trivialization point advancing in high-performance processors

Journal of Systems Architecture: the EUROMICRO Journal
Exploiting program phase behavior for energy reduction on multi-configuration processors

Journal of Systems Architecture: the EUROMICRO Journal
Thermal modeling and management of DRAM memory systems

Proceedings of the 34th annual international symposium on Computer architecture
Interconnect design considerations for large NUCA caches

Proceedings of the 34th annual international symposium on Computer architecture
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite

Proceedings of the 34th annual international symposium on Computer architecture
Transparent control independence (TCI)

Proceedings of the 34th annual international symposium on Computer architecture
Mechanisms for bounding vulnerabilities of processor structures

Proceedings of the 34th annual international symposium on Computer architecture
Dynamic prediction of architectural vulnerability from microarchitectural state

Proceedings of the 34th annual international symposium on Computer architecture
Online diagnosis of hard faults in microprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Improved error reporting for software that uses black-box components

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Microarchitecture Sensitive Empirical Models for Compiler Optimizations

Proceedings of the International Symposium on Code Generation and Optimization
Integrated CPU and l2 cache voltage scaling using machine learning

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Efficient power modeling and software thermal sensing for runtime temperature monitoring

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications

IEEE Transactions on Parallel and Distributed Systems
Transient fault prediction based on anomalies in processor events

Proceedings of the conference on Design, automation and test in Europe
Hierarchical dynamic slicing

Proceedings of the 2007 international symposium on Software testing and analysis
An L2-miss-driven early register deallocation for SMT processors

Proceedings of the 21st annual international conference on Supercomputing
Optimization of data prefetch helper threads with path-expression based statistical modeling

Proceedings of the 21st annual international conference on Supercomputing
Increasing cache capacity through word filtering

Proceedings of the 21st annual international conference on Supercomputing
Cross-component energy management: Joint adaptation of processor and memory

ACM Transactions on Architecture and Code Optimization (TACO)
An analysis of timing violations due to spatially distributed thermal effects in global wires

Proceedings of the 44th annual Design Automation Conference
A profile-driven statistical analysis framework for the design optimization of soft real-time applications

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
HySim: a fast simulation framework for embedded software development

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A fast and generic hybrid simulation approach using C virtual machine

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
A profile-driven statistical analysis framework for the design optimization of soft real-time applications

The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: companion papers
Applying Statistical Sampling for Fast and Efficient Simulation of Commercial Workloads

IEEE Transactions on Computers
Speed versus Accuracy Trade-Offs in Microarchitectural Simulations

IEEE Transactions on Computers
Data prefetching in a cache hierarchy with high bandwidth and capacity

ACM SIGARCH Computer Architecture News
Efficient architectural design space exploration via predictive modeling

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing cache misses through programmable decoders

ACM Transactions on Architecture and Code Optimization (TACO)
On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance

IEEE Transactions on Computers
Memory Data Flow Modeling in Statistical Simulation for the Efficient Exploration of Microprocessor Design Spaces

IEEE Transactions on Computers
Efficiency trends and limits from comprehensive microarchitectural adaptivity

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Accurate branch prediction for short threads

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Dispersing proprietary applications as benchmarks through code mutation

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A superscalar simulation employing poisson distributed stalls

Computers and Electrical Engineering
Rent's rule and parallel programs: characterizing network traffic behavior

Proceedings of the 2008 international workshop on System level interconnect prediction
Performance prediction with skeletons

Cluster Computing
Memory performance attacks: denial of memory service in multi-core systems

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
The revolution inside the box

Communications of the ACM - Web science
Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy

Proceedings of the 18th ACM Great Lakes symposium on VLSI
Reducing the impact of intra-core process variability with criticality-based resource allocation and prefetching

Proceedings of the 5th conference on Computing frontiers
Asymmetrically banked value-aware register files for low-energy and high-performance

Microprocessors & Microsystems
Software-directed combined cpu/link voltage scaling fornoc-based cmps

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Focused prefetching: performance oriented prefetching based on commit stalls

Proceedings of the 22nd annual international conference on Supercomputing
Predictive thread-to-core assignment on a heterogeneous multi-core processor

Proceedings of the 4th workshop on Programming languages and operating systems
Early detection and bypassing of trivial operations to improve energy efficiency of processors

Microprocessors & Microsystems
ReVIVaL: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Automated hardware-independent scenario identification

Proceedings of the 45th annual Design Automation Conference
Improve simulation efficiency using statistical benchmark subsetting: an ImplantBench case study

Proceedings of the 45th annual Design Automation Conference
Impact of dynamic voltage and frequency scaling on the architectural vulnerability of GALS architectures

Proceedings of the 13th international symposium on Low power electronics and design
Distilling the essence of proprietary workloads into miniature benchmarks

ACM Transactions on Architecture and Code Optimization (TACO)
Low-Cost Adaptive Data Prefetching

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Multi-granularity sampling for simulating concurrent heterogeneous applications

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Towards realistic file-system benchmarks with CodeMRI

ACM SIGMETRICS Performance Evaluation Review
Multi-optimization power management for chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
System-scenario-based design of dynamic embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reducing register pressure in SMT processors through L2-miss-driven early register release

ACM Transactions on Architecture and Code Optimization (TACO)
Speculative return address stack management revisited

ACM Transactions on Architecture and Code Optimization (TACO)
Analysing and improving clustering based sampling for microprocessor simulation

International Journal of High Performance Computing and Networking
Hill-climbing SMT processor resource distribution

ACM Transactions on Computer Systems (TOCS)
COTSon: infrastructure for full system simulation

ACM SIGOPS Operating Systems Review
Generalizing neural branch prediction

ACM Transactions on Architecture and Code Optimization (TACO)
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Finding Stress Patterns in Microprocessor Workloads

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations

Transactions on High-Performance Embedded Architectures and Compilers I
Thermal Design Space Exploration of 3D Die Stacked Multi-core Processors Using Geospatial-Based Predictive Models

Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
Generation, Validation and Analysis of SPEC CPU2006 Simulation Points Based on Branch, Memory and TLB Characteristics

Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
Using age registers for a simple load-store queue filtering

Journal of Systems Architecture: the EUROMICRO Journal
Per-thread cycle accounting in SMT processors

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Zero loads: canceling load requests by tracking zero values

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Memory-level parallelism aware fetch policies for simultaneous multithreading processors

ACM Transactions on Architecture and Code Optimization (TACO)
A swarm-inspired resource distribution for SMT processors

Proceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Mini-rank: Adaptive DRAM architecture for improving memory power efficiency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
NBTI tolerant microarchitecture design in the presence of process variation

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Low-power, high-performance analog neural branch prediction

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
Branch Predictor Warmup for Sampled Simulation through Branch History Matching

Transactions on High-Performance Embedded Architectures and Compilers II
Data Cache Techniques to Save Power and Deliver High Performance in Embedded Systems

Transactions on High-Performance Embedded Architectures and Compilers II
Combining Edge Vector and Event Counter for Time-Dependent Power Behavior Characterization

Transactions on High-Performance Embedded Architectures and Compilers II
Accurate Instruction Pre-scheduling in Dynamically Scheduled Processors

Transactions on High-Performance Embedded Architectures and Compilers II
Power-Aware Bus Coscheduling for Periodic Realtime Applications Running on Multiprocessor SoC

Transactions on High-Performance Embedded Architectures and Compilers II
Creating artificial global history to improve branch prediction accuracy

Proceedings of the 23rd international conference on Supercomputing
Workload Reduction for Multi-input Feedback-Directed Optimization

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
An energy-efficient instruction scheduler design with two-level shelving and adaptive banking

Journal of Computer Science and Technology
NAP: a building block for remediating performance bottlenecks via black box network analysis

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices

Proceedings of the 36th annual international symposium on Computer architecture
Support for Urgent Computing Based on Resource Virtualization

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Phase-guided thread-to-core assignment for improved utilization of performance-asymmetric multi-core processors

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Enabling ultra low voltage system operation by tolerating on-chip cache failures

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Energy-efficient register caching with compiler assistance

ACM Transactions on Architecture and Code Optimization (TACO)
Phase detection using trace compilation

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Hybrid Techniques for Fast Multicore Simulation

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Architecture Design for Soft Errors

Architecture Design for Soft Errors
Selective wordline voltage boosting for caches to manage yield under process variations

Proceedings of the 46th Annual Design Automation Conference
Trace-driven workload simulation method for Multiprocessor System-On-Chips

Proceedings of the 46th Annual Design Automation Conference
Enabling software management for multicore caches with a lightweight hardware support

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Space-efficient time-series call-path profiling of parallel applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Future scaling of processor-memory interfaces

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Reducing peak power with a table-driven adaptive processor core

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Quantifying hardware counter sampling error in computer system workload characterization

Quantifying hardware counter sampling error in computer system workload characterization
Multicore power management: ensuring robustness via early-stage formal verification

MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
A cross-layer approach to heterogeneity and reliability

MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
Utilizing predictors for efficient thermal management in multiprocessor SoCs

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Accurately evaluating application performance in simulated hybrid multi-tasking systems

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Probabilistic job symbiosis modeling for SMT processor scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Characterizing processor thermal behavior

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Shoestring: probabilistic soft error reliability on the cheap

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Two-phase trace-driven simulation (TPTS): a fast multicore processor architecture simulation approach

Software—Practice & Experience
Performance modeling for dynamic algorithm selection

ICCS'03 Proceedings of the 2003 international conference on Computational science
Branch history matching: branch predictor warmup for sampled simulation

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Efficient program power behavior characterization

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Exploiting stability to reduce time-space cost for memory tracing

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
A self-adjusting code cache manager to balance start-up time and memory usage

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Exploiting statistical correlations for proactive prediction of program behaviors

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Accelerating multi-core simulators

Proceedings of the 2010 ACM Symposium on Applied Computing
Reuse distance based cache leakage control

HiPC'07 Proceedings of the 14th international conference on High performance computing
A multi-level approach to reduce the impact of NBTI on processor functional units

Proceedings of the 20th symposium on Great lakes symposium on VLSI
A model to exploit power-performance efficiency in superscalar processors via structure resizing

Proceedings of the 20th symposium on Great lakes symposium on VLSI
A self-adaptive scheduler for asymmetric multi-cores

Proceedings of the 20th symposium on Great lakes symposium on VLSI
Using dynamic binary instrumentation to generate multi-platform SimPoints: methodology and accuracy

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Phase complexity surfaces: characterizing time-varying program behavior

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
EXACT: explicit dynamic-branch prediction with active updates

Proceedings of the 7th ACM international conference on Computing frontiers
Reusing cached schedules in an out-of-order processor with in-order issue logic

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Rapid early-stage microarchitecture design using predictive models

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Efficient resource provisioning in compute clouds via VM multiplexing

Proceedings of the 7th international conference on Autonomic computing
Adaptive simulation sampling using an autoregressive framework

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Necromancer: enhancing system throughput by animating dead cores

Proceedings of the 37th annual international symposium on Computer architecture
Leveraging the core-level complementary effects of PVT variations to reduce timing emergencies in multi-core processors

Proceedings of the 37th annual international symposium on Computer architecture
Consistent runtime thermal prediction and control through workload phase detection

Proceedings of the 47th Design Automation Conference
Automated modeling and emulation of interconnect designs for many-core chip multiprocessors

Proceedings of the 47th Design Automation Conference
Automatic Phase Detection and Structure Extraction of MPI Applications

International Journal of High Performance Computing Applications
Applied inference: Case studies in microarchitectural design

ACM Transactions on Architecture and Code Optimization (TACO)
Criticality-driven superscalar design space exploration

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Partitioning streaming parallelism for multi-cores: a machine learning based approach

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Speculative-aware execution: a simple and efficient technique for utilizing multi-cores to improve single-thread performance

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Evaluating the dynamic behaviour of Python applications

ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
IVF: characterizing the vulnerability of microprocessor structures to intermittent faults

Proceedings of the Conference on Design, Automation and Test in Europe
A reconfigurable cache memory with heterogeneous banks

Proceedings of the Conference on Design, Automation and Test in Europe
Dueling CLOCK: adaptive cache replacement policy based on the CLOCK algorithm

Proceedings of the Conference on Design, Automation and Test in Europe
Exploiting narrow-width values for thermal-aware register file designs

Proceedings of the Conference on Design, Automation and Test in Europe
Characterization and exploitation of narrow-width loads: the narrow-width cache approach

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Federation: Boosting per-thread performance of throughput-oriented manycore architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Detailed performance analysis using coarse grain sampling

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Comparing scalability prediction strategies on an SMP of CMPs

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Dynamic program phase detection in distributed shared- memory multiprocessors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Detecting phases in parallel applications on shared memory architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Coterminous locality and coterminous group data prefetching on chip-multiprocessors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
SubsetTrio: An evolutionary, geometric, and statistical benchmark subsetting framework

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Combating Aging with the Colt Duty Cycle Equalizer

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
AVF Stressmark: Towards an Automated Methodology for Bounding the Worst-Case Vulnerability to Soft Errors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
ScalaExtrap: trace-based communication extrapolation for spmd programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Extended histories: improving regularity and performance in correlation prefetchers

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Runtime parallelization of legacy code on a transactional memory system

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
A scalable circuit-architecture co-design to improve memory yield for high-performance processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fine-grained DVFS using on-chip regulators

ACM Transactions on Architecture and Code Optimization (TACO)
An Embedded Software Power Model Based on Algorithm Complexity Using Back-Propagation Neural Networks

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Automatic estimation of performance requirements for software tasks of mobile devices

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
On the exploitation of narrow-width values for improving register file reliability

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Automatic performance debugging of SPMD-style parallel programs

Journal of Parallel and Distributed Computing
Design of last-level on-chip cache using spin-torque transfer RAM (STT RAM)

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Active cache emulator

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Analysis of application-aware on-chip routing under traffic uncertainty

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template

Proceedings of the 38th annual international symposium on Computer architecture
CRIB: consolidated rename, issue, and bypass

Proceedings of the 38th annual international symposium on Computer architecture
Releasing efficient beta cores to market early

Proceedings of the 38th annual international symposium on Computer architecture
BarrierWatch: characterizing multithreaded workloads across and within program-defined epochs

Proceedings of the 8th ACM International Conference on Computing Frontiers
AstroLIT: enabling simulation-based microarchitecture comparison between Intel® and Transmeta designs

Proceedings of the 8th ACM International Conference on Computing Frontiers
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs

ACM Transactions on Architecture and Code Optimization (TACO)
CATCH: A mechanism for dynamically detecting cache-content-duplication in instruction caches

ACM Transactions on Architecture and Code Optimization (TACO)
Managing SMT resource usage through speculative instruction window weighting

ACM Transactions on Architecture and Code Optimization (TACO)
Adaptive, transparent CPU scaling algorithms leveraging inter-node MPI communication regions

Parallel Computing
A unified approach to eliminate memory accesses early

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Segmented bitline cache: exploiting non-uniform memory access patterns

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Do trace cache, value prediction and prefetching improve SMT throughput?

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Performance and power evaluation of an intelligently adaptive data cache

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Dynamic co-allocation of level one caches

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
A practical method for quickly evaluating program optimizations

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Efficient sampling startup for sampled processor simulation

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Enhancing network processor simulation speed with statistical input sampling

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Improving System Energy Efficiency with Memory Rank Subsetting

ACM Transactions on Architecture and Code Optimization (TACO)
A code isolator: isolating code fragments from large programs

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A detailed study on phase predictors

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Improving accuracy of perceptron predictor through correlating data values in SMT processors

ISNN'05 Proceedings of the Second international conference on Advances in Neural Networks - Volume Part III
RDVIS: a tool that visualizes the causes of low locality and hints program optimizations

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Offline phase analysis and optimization for multi-configuration processors

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
A power-efficient and scalable load-store queue design

PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
CRAM: coded registers for amplified multiporting

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
CoreRacer: a practical memory race recorder for multicore x86 TSO processors

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
ScalaExtrap: Trace-based communication extrapolation for SPMD programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Global register alias table: Boosting sequential program on multi-core

Future Generation Computer Systems
Resource-Driven optimizations for transient-fault detecting superscalar microarchitectures

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Making power-efficient data value predictions

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Low-overhead core swapping for thermal management

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Modeling runtime behavior in framework-based applications

ECOOP'06 Proceedings of the 20th European conference on Object-Oriented Programming
Characterizing time-varying program behavior using phase complexity surfaces

Transactions on High-Performance Embedded Architectures and Compilers IV
Microvisor: a runtime architecture for thermal management in chip multiprocessors

Transactions on High-Performance Embedded Architectures and Compilers IV
Finding extreme behaviors in microprocessor workloads

Transactions on High-Performance Embedded Architectures and Compilers IV
Extrinsic and intrinsic text cloning

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Studying hardware and software trade-offs for a real-life web 2.0 workload

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Phase-based tuning for better utilization of performance-asymmetric multicore processors

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Probabilistic modeling for job symbiosis scheduling on SMT processors

ACM Transactions on Architecture and Code Optimization (TACO)
Rank idle time prediction driven last-level cache writeback

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Trace-driven simulation of memory system scheduling in multithread application

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Improving dynamic prediction accuracy through multi-level phase analysis

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Providing fairness on shared-memory multiprocessors via process scheduling

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Phase guided profiling for fast cache modeling

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture

Proceedings of the 26th ACM international conference on Supercomputing
Extracting the optimal sampling frequency of applications using spectral analysis

Concurrency and Computation: Practice & Experience
Thermal-aware sampling in architectural simulation

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
XIOSim: power-performance modeling of mobile x86 cores

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A first-order mechanistic model for architectural vulnerability factor

Proceedings of the 39th Annual International Symposium on Computer Architecture
Improving writeback efficiency with decoupled last-write prediction

Proceedings of the 39th Annual International Symposium on Computer Architecture
FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion

Proceedings of the 39th Annual International Symposium on Computer Architecture
Harmony: collection and analysis of parallel block vectors

Proceedings of the 39th Annual International Symposium on Computer Architecture
Power-aware multi-core simulation for early design stage hardware/software co-optimization

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Frances: A Tool for Understanding Computer Architecture and Assembly Language

ACM Transactions on Computing Education (TOCE)
Architectural implications of spatial thermal filtering

Integration, the VLSI Journal
MAGE: adaptive granularity and ECC for resilient and power efficient memory systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hardware-software coherence protocol for the coexistence of caches and local memories

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Automatic structure extraction from MPI applications tracefiles

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
A power-aware alternative for the perceptron branch predictor

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Per-thread cycle accounting in multicore processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Adaptive instruction dispatching techniques for Simultaneous Multi-Threading (SMT) processors

Computers and Electrical Engineering
Accurately modeling superscalar processor performance with reduced trace

Journal of Parallel and Distributed Computing
The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

ACM Transactions on Architecture and Code Optimization (TACO)
FPB: Fine-grained Power Budgeting to Improve Write Throughput of Multi-level Cell Phase Change Memory

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Inferred Models for Dynamic and Sparse Hardware-Software Spaces

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Accelerating GPGPU architecture simulation

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
eDoctor: automatically diagnosing abnormal battery drain issues on smartphones

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Memristors for neural branch prediction: a case study in strict latency and write endurance challenges

Proceedings of the ACM International Conference on Computing Frontiers
Combating NBTI-induced aging in data caches

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Memory array protection: check on read or check on write?

Proceedings of the Conference on Design, Automation and Test in Europe
Multi-level phase analysis for sampling simulation

Proceedings of the Conference on Design, Automation and Test in Europe
Capturing vulnerability variations for register files

Proceedings of the Conference on Design, Automation and Test in Europe
Reducing memory access latency with asymmetric DRAM bank organizations

Proceedings of the 40th Annual International Symposium on Computer Architecture
Resilient die-stacked DRAM caches

Proceedings of the 40th Annual International Symposium on Computer Architecture
ZSim: fast and accurate microarchitectural simulation of thousand-core systems

Proceedings of the 40th Annual International Symposium on Computer Architecture
Enhancing NBTI recovery in SRAM arrays through recovery boosting

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Replicating tag entries for reliability enhancement in cache tag arrays

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IVF: characterizing the vulnerability of microprocessor structures to intermittent faults

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hybrid simulation for extensible processor cores

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Low power aging-aware register file design by duty cycle balancing

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
On the usefulness of object tracking techniques in performance analysis

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using machine learning to partition streaming programs

ACM Transactions on Architecture and Code Optimization (TACO)
A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
SMT-centric power-aware thread placement in chip multiprocessors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
DeepDive: transparently identifying and managing performance interference in virtualized environments

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Mantis: automatic performance prediction for smartphone applications

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Recalling instructions from idling threads to maximize resource utilization for simultaneous multi-threading processors

Computers and Electrical Engineering
TLC: a tag-less cache for reducing dynamic first level cache energy

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Insertion and promotion for tree-based PseudoLRU last-level caches

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Trace based phase prediction for tightly-coupled heterogeneous cores

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Speculative hardware/software co-designed floating-point multiply-add fusion

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Effective and efficient microprocessor design space exploration using unlabeled design configurations

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Warm-Up Simulation Methodology for HW/SW Co-Designed Processors

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Fine-grained Benchmark Subsetting for System Selection

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Selecting representative benchmark inputs for exploring microprocessor design spaces

ACM Transactions on Architecture and Code Optimization (TACO)
PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling

ACM Transactions on Architecture and Code Optimization (TACO)
WADE: Writeback-aware dynamic cache management for NVM-based main memory system

ACM Transactions on Architecture and Code Optimization (TACO)
Mesoscale performance simulation of multicore processor systems

Software and Systems Modeling (SoSyM)
A performance-aware quality of service-driven scheduler for multicore processors

ACM SIGBED Review - Special Issue on the 3rd Embedded Operating System Workshop (EWiLi 2013)

Quantified Score

Hi-index	0.04

Visualization

Abstract

Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and compiler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of time-varying behavior, we must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution.Our goal is to develop automatic techniques that are capable of finding and exploiting the Large Scale Behavior of programs (behavior seen over billions of instructions). The first step towards this goal is the development of a hardware independent metric that can concisely summarize the behavior of an arbitrary section of execution in a program. To this end we examine the use of Basic Block Vectors. We quantify the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explore the large scale behavior of several programs, and develop a set of algorithms based on clustering capable of analyzing this behavior. We then demonstrate an application of this technology to automatically determine where to simulate for a program to help guide computer architecture research.