Robust Clustering with Applications in Computer Vision
IEEE Transactions on Pattern Analysis and Machine Intelligence
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
HLS: combining statistical and symbolic simulation to guide microprocessor designs
Proceedings of the 27th annual international symposium on Computer architecture
Workload characterization of emerging computer applications
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Reducing State Loss For Effective Trace Sampling of Superscalar Processors
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Modeling Superscalar Processors via Statistical Simulation
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Experiments with Random Projection
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Minimal Subset Evaluation: Rapid Warm-Up for Simulated Hardware State
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation
Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation
Pointer cache assisted prefetching
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic trace selection using performance monitoring hardware sampling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Phi-Predication for light-weight if-conversion
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A framework for modeling and optimization of prescient instruction prefetch
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Run-time modeling and estimation of operating system power consumption
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using SimPoint for accurate and efficient simulation
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Predicate prediction for efficient out-of-order execution
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Catching Accurate Profiles in Hardware
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Positional adaptation of processors: application to energy reduction
Proceedings of the 30th annual international symposium on Computer architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
Reducing power density through activity migration
Proceedings of the 2003 international symposium on Low power electronics and design
Comparing Program Phase Detection Techniques
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
VHC: Quickly Building an Optimizer for Complex Embedded Architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP
ACM Transactions on Architecture and Code Optimization (TACO)
Circuit and microarchitectural techniques for reducing cache leakage power
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Locality-Based Online Trace Compression
IEEE Transactions on Computers
Circuit-aware architectural simulation
Proceedings of the 41st annual Design Automation Conference
EXPERT: expedited simulation exploiting program behavior repetition
Proceedings of the 18th annual international conference on Supercomputing
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor
Proceedings of the 31st annual international symposium on Computer architecture
Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies
Proceedings of the 31st annual international symposium on Computer architecture
Efficient simulation of trace samples on parallel machines
Parallel Computing
A low-power in-order/out-of-order issue queue
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing pipeline energy demands with local DVS and dynamic retiming
Proceedings of the 2004 international symposium on Low power electronics and design
Profile-based adaptation for cache decay
ACM Transactions on Architecture and Code Optimization (TACO)
HIDE: an infrastructure for efficiently protecting information leakage on the address bus
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Fingerprinting: bounding soft-error detection latency and bandwidth
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Architectural Support for Enhanced SMT Job Scheduling
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Method-level phase behavior in java workloads
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Automatic Synthesis of High-Speed Processor Simulators
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Control Flow Optimization Via Dynamic Reconvergence Prediction
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Hardware and Binary Modification Support for Code Pointer Protection From Buffer Overflow
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Maintaining Consistency and Bounding Capacity of Software Code Caches
Proceedings of the international symposium on Code generation and optimization
Proceedings of the international symposium on Code generation and optimization
Reactive Techniques for Controlling Software Speculation
Proceedings of the international symposium on Code generation and optimization
DVS for On-Chip Bus Designs Based on Timing Error Correction
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Automatic Construction and Evaluation of Performance Skeletons
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Using Phase Behavior in Scientific Application to Guide Linux Operating System Customization
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Toward an Evaluation Infrastructure for Power and Energy Optimizations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
How to use SimPoint to pick simulation points
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
A compressed memory hierarchy using an indirect index cache
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Accelerated warmup for sampled microarchitecture simulation
ACM Transactions on Architecture and Code Optimization (TACO)
Owl: next generation system monitoring
Proceedings of the 2nd conference on Computing frontiers
Controlling leakage power with the replacement policy in slumberous caches
Proceedings of the 2nd conference on Computing frontiers
Scheduling for heterogeneous processors in server systems
Proceedings of the 2nd conference on Computing frontiers
On the energy-efficiency of speculative hardware
Proceedings of the 2nd conference on Computing frontiers
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Replicating memory behavior for performance prediction
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
High Efficiency Counter Mode Security Architecture via Prediction and Precomputation
Proceedings of the 32nd annual international symposium on Computer Architecture
A Robust Main-Memory Compression Scheme
Proceedings of the 32nd annual international symposium on Computer Architecture
Rescue: A Microarchitecture for Testability and Defect Tolerance
Proceedings of the 32nd annual international symposium on Computer Architecture
Opportunistic Transient-Fault Detection
Proceedings of the 32nd annual international symposium on Computer Architecture
Piecewise Linear Branch Prediction
Proceedings of the 32nd annual international symposium on Computer Architecture
Store Buffer Design in First-Level Multibanked Data Caches
Proceedings of the 32nd annual international symposium on Computer Architecture
Computing Architectural Vulnerability Factors for Address-Based Structures
Proceedings of the 32nd annual international symposium on Computer Architecture
Visualization and analysis of phased behavior in Java programs
Proceedings of the 3rd international symposium on Principles and practice of programming in Java
Understanding the energy efficiency of SMT and CMP with multiclustering
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
A multinomial clustering model for fast simulation of computer architecture designs
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Dynamic detection and visualization of software phases
WODA '05 Proceedings of the third international workshop on Dynamic analysis
Dynamic phase analysis for cycle-close trace generation
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Fast and fair: data-stream quality of service
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Online performance analysis by statistical sampling of microprocessor performance counters
Proceedings of the 19th annual international conference on Supercomputing
Improved automatic testcase synthesis for performance model validation
Proceedings of the 19th annual international conference on Supercomputing
The implications of working set analysis on supercomputing memory hierarchy design
Proceedings of the 19th annual international conference on Supercomputing
Exploring the limits of leakage power reduction in caches
ACM Transactions on Architecture and Code Optimization (TACO)
Merging path and gshare indexing in perceptron branch prediction
ACM Transactions on Architecture and Code Optimization (TACO)
Dual-Core Execution: Building a Highly Scalable Single-Thread Instruction Window
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Future Execution: A Hardware Prefetching Technique for Chip Multiprocessors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Reducing the Latency and Area Cost of Core Swapping through Shared Helper Engines
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Load-Store Queue Management: an Energy-Efficient Design Based on a State-Filtering Mechanism.
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Reducing Instruction Fetch Cost by Packing Instructions into RegisterWindows
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Improving Region Selection in Dynamic Optimization Systems
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Autonomic Microprocessor Execution via Self-Repairing Arrays
IEEE Transactions on Dependable and Secure Computing
DBmbench: fast and accurate database workload representation on modern microarchitecture
CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
Improving memory system performance with energy-efficient value speculation
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Bridging the Processor-Memory Performance Gapwith 3D IC Technology
IEEE Design & Test
Optimal sample length for efficient cache simulation
Journal of Systems Architecture: the EUROMICRO Journal
On the importance of optimizing the configuration of stream prefetchers
Proceedings of the 2005 workshop on Memory system performance
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
Revised Stride Data Value Predictor Design
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Design, implementation, and verification of active cache emulator (ACE)
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Placement for configurable dataflow architecture
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Microarchitecture evaluation with floorplanning and interconnect pipelining
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Online Phase Detection Algorithms
Proceedings of the International Symposium on Code Generation and Optimization
Region Monitoring for Local Phase Detection in Dynamic Optimization Systems
Proceedings of the International Symposium on Code Generation and Optimization
Selecting Software Phase Markers with Code Structure Analysis
Proceedings of the International Symposium on Code Generation and Optimization
2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set
Proceedings of the International Symposium on Code Generation and Optimization
Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The Camino Compiler infrastructure
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Low cost trace-driven memory simulation using SimPoint
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Speculative early register release
Proceedings of the 3rd conference on Computing frontiers
Evaluation of the field-programmable cache: performance and energy consumption
Proceedings of the 3rd conference on Computing frontiers
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Phase-based visualization and analysis of Java programs
Science of Computer Programming - Special issue: Principles and practices of programming in Java (PPPJ 2004)
Efficient remote profiling for resource-constrained devices
ACM Transactions on Architecture and Code Optimization (TACO)
Decomposing memory performance: data structures and phases
Proceedings of the 5th international symposium on Memory management
Techniques for Multicore Thermal Management: Classification and New Exploration
Proceedings of the 33rd annual international symposium on Computer Architecture
Learning-Based SMT Processor Resource Distribution via Hill-Climbing
Proceedings of the 33rd annual international symposium on Computer Architecture
Measuring Benchmark Similarity Using Inherent Program Characteristics
IEEE Transactions on Computers
Instruction packing: Toward fast and energy-efficient instruction scheduling
ACM Transactions on Architecture and Code Optimization (TACO)
Automatic logging of operating system effects to guide application-level architecture simulation
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Applying architectural vulnerability Analysis to hard faults in the microprocessor
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
SMA: a self-monitored adaptive cache warm-up scheme for microprocessor simulation
International Journal of Parallel Programming
A systematic method for functional unit power estimation in microprocessors
Proceedings of the 43rd annual Design Automation Conference
Statistical sampling of microarchitecture simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Design and analysis of spatial encoding circuits for peak power reduction in on-chip buses
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Extracting and improving microarchitecture performance on reconfigurable architectures
International Journal of Parallel Programming - Special issue: The next generation software program
Core architecture optimization for heterogeneous chip multiprocessors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Wavelet-based phase classification
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Complexity-based program phase analysis and classification
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Performance prediction based on inherent program similarity
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Adaptive reorder buffers for SMT processors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Long-latency branches: how much do they matter?
ACM SIGARCH Computer Architecture News
Efficient Sampling Startup for SimPoint
IEEE Micro
Register file caching for energy efficiency
Proceedings of the 2006 international symposium on Low power electronics and design
Power efficiency for variation-tolerant multicore processors
Proceedings of the 2006 international symposium on Low power electronics and design
Data prefetching in a cache hierarchy with high bandwidth and capacity
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
SlicK: slice-based locality exploitation for efficient redundant multithreading
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Accurate and efficient regression modeling for microarchitectural performance and power prediction
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Efficiently exploring architectural design spaces via predictive modeling
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
HeapMD: identifying heap-based bugs using anomaly detection
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Automatic performance model construction for the fast software exploration of new hardware designs
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Yet shorter warmup by combining no-state-loss and MRRL for sampled LRU cache simulation
Journal of Systems and Software - Special issue: Quality software
The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools
Proceedings of the 20th annual international conference on Supercomputing
Accurate memory data flow modeling in statistical simulation
Proceedings of the 20th annual international conference on Supercomputing
A scalable low power issue queue for large instruction window processors
Proceedings of the 20th annual international conference on Supercomputing
IEEE Transactions on Computers
Future execution: A prefetching mechanism that uses multiple cores to speed up single threads
ACM Transactions on Architecture and Code Optimization (TACO)
Adaptive, transparent frequency and voltage scaling of communication phases in MPI programs
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
ZettaRAM: A Power-Scalable DRAM Alternative through Charge-Voltage Decoupling
IEEE Transactions on Computers
Exploiting Operand Availability for Efficient Simultaneous Multithreading
IEEE Transactions on Computers
Authentication Control Point and Its Implications For Secure Processor Design
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
DMDC: Delayed Memory Dependence Checking through Age-Based Filtering
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Mitigating the Impact of Process Variations on Processor Register Files and Execution Units
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
NSL-BLRL: Efficient CacheWarmup for Sampled Processor Simulation
ANSS '06 Proceedings of the 39th annual Symposium on Simulation
Design and Implementation of aWorkload Specific Simulator
ANSS '06 Proceedings of the 39th annual Symposium on Simulation
M-TREE: a high efficiency security architecture for protecting integrity and privacy of software
Journal of Parallel and Distributed Computing - Special issue: Security in grid and distributed systems
A comparison of online and offline strategies for program adaptation
ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Analysis of hardware prefetching across virtual page boundaries
Proceedings of the 4th international conference on Computing frontiers
Unified microprocessor core storage
Proceedings of the 4th international conference on Computing frontiers
Accelerating memory decryption and authentication with frequent value prediction
Proceedings of the 4th international conference on Computing frontiers
By-passing the out-of-order execution pipeline to increase energy-efficiency
Proceedings of the 4th international conference on Computing frontiers
Computational and storage power optimizations for the O-GEHL branch predictor
Proceedings of the 4th international conference on Computing frontiers
Speculative trivialization point advancing in high-performance processors
Journal of Systems Architecture: the EUROMICRO Journal
Exploiting program phase behavior for energy reduction on multi-configuration processors
Journal of Systems Architecture: the EUROMICRO Journal
Thermal modeling and management of DRAM memory systems
Proceedings of the 34th annual international symposium on Computer architecture
Interconnect design considerations for large NUCA caches
Proceedings of the 34th annual international symposium on Computer architecture
Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite
Proceedings of the 34th annual international symposium on Computer architecture
Transparent control independence (TCI)
Proceedings of the 34th annual international symposium on Computer architecture
Mechanisms for bounding vulnerabilities of processor structures
Proceedings of the 34th annual international symposium on Computer architecture
Dynamic prediction of architectural vulnerability from microarchitectural state
Proceedings of the 34th annual international symposium on Computer architecture
Online diagnosis of hard faults in microprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Improved error reporting for software that uses black-box components
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Microarchitecture Sensitive Empirical Models for Compiler Optimizations
Proceedings of the International Symposium on Code Generation and Optimization
Integrated CPU and l2 cache voltage scaling using machine learning
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Efficient power modeling and software thermal sensing for runtime temperature monitoring
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications
IEEE Transactions on Parallel and Distributed Systems
Transient fault prediction based on anomalies in processor events
Proceedings of the conference on Design, automation and test in Europe
Proceedings of the 2007 international symposium on Software testing and analysis
An L2-miss-driven early register deallocation for SMT processors
Proceedings of the 21st annual international conference on Supercomputing
Optimization of data prefetch helper threads with path-expression based statistical modeling
Proceedings of the 21st annual international conference on Supercomputing
Increasing cache capacity through word filtering
Proceedings of the 21st annual international conference on Supercomputing
Cross-component energy management: Joint adaptation of processor and memory
ACM Transactions on Architecture and Code Optimization (TACO)
An analysis of timing violations due to spatially distributed thermal effects in global wires
Proceedings of the 44th annual Design Automation Conference
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
HySim: a fast simulation framework for embedded software development
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
A fast and generic hybrid simulation approach using C virtual machine
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: companion papers
Applying Statistical Sampling for Fast and Efficient Simulation of Commercial Workloads
IEEE Transactions on Computers
Speed versus Accuracy Trade-Offs in Microarchitectural Simulations
IEEE Transactions on Computers
Data prefetching in a cache hierarchy with high bandwidth and capacity
ACM SIGARCH Computer Architecture News
Efficient architectural design space exploration via predictive modeling
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing cache misses through programmable decoders
ACM Transactions on Architecture and Code Optimization (TACO)
On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance
IEEE Transactions on Computers
IEEE Transactions on Computers
Efficiency trends and limits from comprehensive microarchitectural adaptivity
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Accurate branch prediction for short threads
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Dispersing proprietary applications as benchmarks through code mutation
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A superscalar simulation employing poisson distributed stalls
Computers and Electrical Engineering
Rent's rule and parallel programs: characterizing network traffic behavior
Proceedings of the 2008 international workshop on System level interconnect prediction
Performance prediction with skeletons
Cluster Computing
Memory performance attacks: denial of memory service in multi-core systems
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Communications of the ACM - Web science
Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Proceedings of the 5th conference on Computing frontiers
Asymmetrically banked value-aware register files for low-energy and high-performance
Microprocessors & Microsystems
Software-directed combined cpu/link voltage scaling fornoc-based cmps
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Focused prefetching: performance oriented prefetching based on commit stalls
Proceedings of the 22nd annual international conference on Supercomputing
Predictive thread-to-core assignment on a heterogeneous multi-core processor
Proceedings of the 4th workshop on Programming languages and operating systems
Early detection and bypassing of trivial operations to improve energy efficiency of processors
Microprocessors & Microsystems
ReVIVaL: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Automated hardware-independent scenario identification
Proceedings of the 45th annual Design Automation Conference
Improve simulation efficiency using statistical benchmark subsetting: an ImplantBench case study
Proceedings of the 45th annual Design Automation Conference
Proceedings of the 13th international symposium on Low power electronics and design
Distilling the essence of proprietary workloads into miniature benchmarks
ACM Transactions on Architecture and Code Optimization (TACO)
Low-Cost Adaptive Data Prefetching
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Multi-granularity sampling for simulating concurrent heterogeneous applications
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Towards realistic file-system benchmarks with CodeMRI
ACM SIGMETRICS Performance Evaluation Review
Multi-optimization power management for chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Analysis and approximation of optimal co-scheduling on chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
System-scenario-based design of dynamic embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reducing register pressure in SMT processors through L2-miss-driven early register release
ACM Transactions on Architecture and Code Optimization (TACO)
Speculative return address stack management revisited
ACM Transactions on Architecture and Code Optimization (TACO)
Analysing and improving clustering based sampling for microprocessor simulation
International Journal of High Performance Computing and Networking
Hill-climbing SMT processor resource distribution
ACM Transactions on Computer Systems (TOCS)
COTSon: infrastructure for full system simulation
ACM SIGOPS Operating Systems Review
Generalizing neural branch prediction
ACM Transactions on Architecture and Code Optimization (TACO)
MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Finding Stress Patterns in Microprocessor Workloads
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations
Transactions on High-Performance Embedded Architectures and Compilers I
Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
Using age registers for a simple load-store queue filtering
Journal of Systems Architecture: the EUROMICRO Journal
Per-thread cycle accounting in SMT processors
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Zero loads: canceling load requests by tracking zero values
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
ACM Transactions on Architecture and Code Optimization (TACO)
A swarm-inspired resource distribution for SMT processors
Proceedings of the 3rd International Conference on Bio-Inspired Models of Network, Information and Computing Sytems
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Mini-rank: Adaptive DRAM architecture for improving memory power efficiency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
NBTI tolerant microarchitecture design in the presence of process variation
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Low-power, high-performance analog neural branch prediction
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A mechanistic performance model for superscalar out-of-order processors
ACM Transactions on Computer Systems (TOCS)
Branch Predictor Warmup for Sampled Simulation through Branch History Matching
Transactions on High-Performance Embedded Architectures and Compilers II
Data Cache Techniques to Save Power and Deliver High Performance in Embedded Systems
Transactions on High-Performance Embedded Architectures and Compilers II
Combining Edge Vector and Event Counter for Time-Dependent Power Behavior Characterization
Transactions on High-Performance Embedded Architectures and Compilers II
Accurate Instruction Pre-scheduling in Dynamically Scheduled Processors
Transactions on High-Performance Embedded Architectures and Compilers II
Power-Aware Bus Coscheduling for Periodic Realtime Applications Running on Multiprocessor SoC
Transactions on High-Performance Embedded Architectures and Compilers II
Creating artificial global history to improve branch prediction accuracy
Proceedings of the 23rd international conference on Supercomputing
Workload Reduction for Multi-input Feedback-Directed Optimization
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
An energy-efficient instruction scheduler design with two-level shelving and adaptive banking
Journal of Computer Science and Technology
NAP: a building block for remediating performance bottlenecks via black box network analysis
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices
Proceedings of the 36th annual international symposium on Computer architecture
Support for Urgent Computing Based on Resource Virtualization
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Enabling ultra low voltage system operation by tolerating on-chip cache failures
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Energy-efficient register caching with compiler assistance
ACM Transactions on Architecture and Code Optimization (TACO)
Phase detection using trace compilation
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Hybrid Techniques for Fast Multicore Simulation
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Architecture Design for Soft Errors
Architecture Design for Soft Errors
Selective wordline voltage boosting for caches to manage yield under process variations
Proceedings of the 46th Annual Design Automation Conference
Trace-driven workload simulation method for Multiprocessor System-On-Chips
Proceedings of the 46th Annual Design Automation Conference
Enabling software management for multicore caches with a lightweight hardware support
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Space-efficient time-series call-path profiling of parallel applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Future scaling of processor-memory interfaces
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Reducing peak power with a table-driven adaptive processor core
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Quantifying hardware counter sampling error in computer system workload characterization
Quantifying hardware counter sampling error in computer system workload characterization
Multicore power management: ensuring robustness via early-stage formal verification
MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
A cross-layer approach to heterogeneity and reliability
MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
Utilizing predictors for efficient thermal management in multiprocessor SoCs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Accurately evaluating application performance in simulated hybrid multi-tasking systems
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Probabilistic job symbiosis modeling for SMT processor scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Characterizing processor thermal behavior
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Shoestring: probabilistic soft error reliability on the cheap
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Software—Practice & Experience
Performance modeling for dynamic algorithm selection
ICCS'03 Proceedings of the 2003 international conference on Computational science
Branch history matching: branch predictor warmup for sampled simulation
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Efficient program power behavior characterization
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Exploiting stability to reduce time-space cost for memory tracing
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
A self-adjusting code cache manager to balance start-up time and memory usage
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Exploiting statistical correlations for proactive prediction of program behaviors
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Accelerating multi-core simulators
Proceedings of the 2010 ACM Symposium on Applied Computing
Reuse distance based cache leakage control
HiPC'07 Proceedings of the 14th international conference on High performance computing
A multi-level approach to reduce the impact of NBTI on processor functional units
Proceedings of the 20th symposium on Great lakes symposium on VLSI
A model to exploit power-performance efficiency in superscalar processors via structure resizing
Proceedings of the 20th symposium on Great lakes symposium on VLSI
A self-adaptive scheduler for asymmetric multi-cores
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Using dynamic binary instrumentation to generate multi-platform SimPoints: methodology and accuracy
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Phase complexity surfaces: characterizing time-varying program behavior
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
EXACT: explicit dynamic-branch prediction with active updates
Proceedings of the 7th ACM international conference on Computing frontiers
Reusing cached schedules in an out-of-order processor with in-order issue logic
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Rapid early-stage microarchitecture design using predictive models
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Efficient resource provisioning in compute clouds via VM multiplexing
Proceedings of the 7th international conference on Autonomic computing
Adaptive simulation sampling using an autoregressive framework
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Necromancer: enhancing system throughput by animating dead cores
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 37th annual international symposium on Computer architecture
Consistent runtime thermal prediction and control through workload phase detection
Proceedings of the 47th Design Automation Conference
Automated modeling and emulation of interconnect designs for many-core chip multiprocessors
Proceedings of the 47th Design Automation Conference
Automatic Phase Detection and Structure Extraction of MPI Applications
International Journal of High Performance Computing Applications
Applied inference: Case studies in microarchitectural design
ACM Transactions on Architecture and Code Optimization (TACO)
Criticality-driven superscalar design space exploration
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Partitioning streaming parallelism for multi-cores: a machine learning based approach
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Evaluating the dynamic behaviour of Python applications
ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
IVF: characterizing the vulnerability of microprocessor structures to intermittent faults
Proceedings of the Conference on Design, Automation and Test in Europe
A reconfigurable cache memory with heterogeneous banks
Proceedings of the Conference on Design, Automation and Test in Europe
Dueling CLOCK: adaptive cache replacement policy based on the CLOCK algorithm
Proceedings of the Conference on Design, Automation and Test in Europe
Exploiting narrow-width values for thermal-aware register file designs
Proceedings of the Conference on Design, Automation and Test in Europe
Characterization and exploitation of narrow-width loads: the narrow-width cache approach
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Federation: Boosting per-thread performance of throughput-oriented manycore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Detailed performance analysis using coarse grain sampling
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Comparing scalability prediction strategies on an SMP of CMPs
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Dynamic program phase detection in distributed shared- memory multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Detecting phases in parallel applications on shared memory architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Coterminous locality and coterminous group data prefetching on chip-multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
SubsetTrio: An evolutionary, geometric, and statistical benchmark subsetting framework
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Combating Aging with the Colt Duty Cycle Equalizer
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
ScalaExtrap: trace-based communication extrapolation for spmd programs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Extended histories: improving regularity and performance in correlation prefetchers
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Runtime parallelization of legacy code on a transactional memory system
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
A scalable circuit-architecture co-design to improve memory yield for high-performance processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fine-grained DVFS using on-chip regulators
ACM Transactions on Architecture and Code Optimization (TACO)
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Automatic estimation of performance requirements for software tasks of mobile devices
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
On the exploitation of narrow-width values for improving register file reliability
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Automatic performance debugging of SPMD-style parallel programs
Journal of Parallel and Distributed Computing
Design of last-level on-chip cache using spin-torque transfer RAM (STT RAM)
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Analysis of application-aware on-chip routing under traffic uncertainty
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Proceedings of the 38th annual international symposium on Computer architecture
CRIB: consolidated rename, issue, and bypass
Proceedings of the 38th annual international symposium on Computer architecture
Releasing efficient beta cores to market early
Proceedings of the 38th annual international symposium on Computer architecture
BarrierWatch: characterizing multithreaded workloads across and within program-defined epochs
Proceedings of the 8th ACM International Conference on Computing Frontiers
Proceedings of the 8th ACM International Conference on Computing Frontiers
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
ACM Transactions on Architecture and Code Optimization (TACO)
CATCH: A mechanism for dynamically detecting cache-content-duplication in instruction caches
ACM Transactions on Architecture and Code Optimization (TACO)
Managing SMT resource usage through speculative instruction window weighting
ACM Transactions on Architecture and Code Optimization (TACO)
A unified approach to eliminate memory accesses early
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Segmented bitline cache: exploiting non-uniform memory access patterns
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Do trace cache, value prediction and prefetching improve SMT throughput?
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Performance and power evaluation of an intelligently adaptive data cache
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Dynamic co-allocation of level one caches
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
A practical method for quickly evaluating program optimizations
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Efficient sampling startup for sampled processor simulation
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Enhancing network processor simulation speed with statistical input sampling
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Improving System Energy Efficiency with Memory Rank Subsetting
ACM Transactions on Architecture and Code Optimization (TACO)
A code isolator: isolating code fragments from large programs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A detailed study on phase predictors
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Improving accuracy of perceptron predictor through correlating data values in SMT processors
ISNN'05 Proceedings of the Second international conference on Advances in Neural Networks - Volume Part III
RDVIS: a tool that visualizes the causes of low locality and hints program optimizations
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Offline phase analysis and optimization for multi-configuration processors
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
A power-efficient and scalable load-store queue design
PATMOS'05 Proceedings of the 15th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
CRAM: coded registers for amplified multiporting
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
CoreRacer: a practical memory race recorder for multicore x86 TSO processors
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
ScalaExtrap: Trace-based communication extrapolation for SPMD programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Global register alias table: Boosting sequential program on multi-core
Future Generation Computer Systems
Resource-Driven optimizations for transient-fault detecting superscalar microarchitectures
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Making power-efficient data value predictions
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Low-overhead core swapping for thermal management
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Modeling runtime behavior in framework-based applications
ECOOP'06 Proceedings of the 20th European conference on Object-Oriented Programming
Characterizing time-varying program behavior using phase complexity surfaces
Transactions on High-Performance Embedded Architectures and Compilers IV
Microvisor: a runtime architecture for thermal management in chip multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers IV
Finding extreme behaviors in microprocessor workloads
Transactions on High-Performance Embedded Architectures and Compilers IV
Extrinsic and intrinsic text cloning
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Studying hardware and software trade-offs for a real-life web 2.0 workload
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Phase-based tuning for better utilization of performance-asymmetric multicore processors
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Probabilistic modeling for job symbiosis scheduling on SMT processors
ACM Transactions on Architecture and Code Optimization (TACO)
Rank idle time prediction driven last-level cache writeback
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Trace-driven simulation of memory system scheduling in multithread application
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Improving dynamic prediction accuracy through multi-level phase analysis
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Providing fairness on shared-memory multiprocessors via process scheduling
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Phase guided profiling for fast cache modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Proceedings of the 26th ACM international conference on Supercomputing
Extracting the optimal sampling frequency of applications using spectral analysis
Concurrency and Computation: Practice & Experience
Thermal-aware sampling in architectural simulation
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
XIOSim: power-performance modeling of mobile x86 cores
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A first-order mechanistic model for architectural vulnerability factor
Proceedings of the 39th Annual International Symposium on Computer Architecture
Improving writeback efficiency with decoupled last-write prediction
Proceedings of the 39th Annual International Symposium on Computer Architecture
FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion
Proceedings of the 39th Annual International Symposium on Computer Architecture
Harmony: collection and analysis of parallel block vectors
Proceedings of the 39th Annual International Symposium on Computer Architecture
Power-aware multi-core simulation for early design stage hardware/software co-optimization
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Frances: A Tool for Understanding Computer Architecture and Assembly Language
ACM Transactions on Computing Education (TOCE)
Architectural implications of spatial thermal filtering
Integration, the VLSI Journal
MAGE: adaptive granularity and ECC for resilient and power efficient memory systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hardware-software coherence protocol for the coexistence of caches and local memories
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Automatic structure extraction from MPI applications tracefiles
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
A power-aware alternative for the perceptron branch predictor
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Per-thread cycle accounting in multicore processors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Adaptive instruction dispatching techniques for Simultaneous Multi-Threading (SMT) processors
Computers and Electrical Engineering
Accurately modeling superscalar processor performance with reduced trace
Journal of Parallel and Distributed Computing
ACM Transactions on Architecture and Code Optimization (TACO)
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Inferred Models for Dynamic and Sparse Hardware-Software Spaces
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Accelerating GPGPU architecture simulation
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
eDoctor: automatically diagnosing abnormal battery drain issues on smartphones
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Proceedings of the ACM International Conference on Computing Frontiers
Combating NBTI-induced aging in data caches
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Memory array protection: check on read or check on write?
Proceedings of the Conference on Design, Automation and Test in Europe
Multi-level phase analysis for sampling simulation
Proceedings of the Conference on Design, Automation and Test in Europe
Capturing vulnerability variations for register files
Proceedings of the Conference on Design, Automation and Test in Europe
Reducing memory access latency with asymmetric DRAM bank organizations
Proceedings of the 40th Annual International Symposium on Computer Architecture
Resilient die-stacked DRAM caches
Proceedings of the 40th Annual International Symposium on Computer Architecture
ZSim: fast and accurate microarchitectural simulation of thousand-core systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
Enhancing NBTI recovery in SRAM arrays through recovery boosting
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Replicating tag entries for reliability enhancement in cache tag arrays
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IVF: characterizing the vulnerability of microprocessor structures to intermittent faults
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hybrid simulation for extensible processor cores
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Low power aging-aware register file design by duty cycle balancing
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
On the usefulness of object tracking techniques in performance analysis
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using machine learning to partition streaming programs
ACM Transactions on Architecture and Code Optimization (TACO)
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
SMT-centric power-aware thread placement in chip multiprocessors
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Mantis: automatic performance prediction for smartphone applications
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Computers and Electrical Engineering
TLC: a tag-less cache for reducing dynamic first level cache energy
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Insertion and promotion for tree-based PseudoLRU last-level caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Trace based phase prediction for tightly-coupled heterogeneous cores
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Speculative hardware/software co-designed floating-point multiply-add fusion
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Warm-Up Simulation Methodology for HW/SW Co-Designed Processors
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Fine-grained Benchmark Subsetting for System Selection
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Selecting representative benchmark inputs for exploring microprocessor design spaces
ACM Transactions on Architecture and Code Optimization (TACO)
PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling
ACM Transactions on Architecture and Code Optimization (TACO)
WADE: Writeback-aware dynamic cache management for NVM-based main memory system
ACM Transactions on Architecture and Code Optimization (TACO)
Mesoscale performance simulation of multicore processor systems
Software and Systems Modeling (SoSyM)
A performance-aware quality of service-driven scheduler for multicore processors
ACM SIGBED Review - Special Issue on the 3rd Embedded Operating System Workshop (EWiLi 2013)
Hi-index | 0.04 |
Understanding program behavior is at the foundation of computer architecture and program optimization. Many programs have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and compiler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of time-varying behavior, we must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution.Our goal is to develop automatic techniques that are capable of finding and exploiting the Large Scale Behavior of programs (behavior seen over billions of instructions). The first step towards this goal is the development of a hardware independent metric that can concisely summarize the behavior of an arbitrary section of execution in a program. To this end we examine the use of Basic Block Vectors. We quantify the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explore the large scale behavior of several programs, and develop a set of algorithms based on clustering capable of analyzing this behavior. We then demonstrate an application of this technology to automatically determine where to simulate for a program to help guide computer architecture research.