Managing multi-configuration hardware via dynamic working set analysis
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Decoupled Predictor-Directed Stream Prefetching Architecture
IEEE Transactions on Computers
High Performance and Energy Efficient Serial Prefetch Architecture
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
An EPIC Processor with Pending Functional Units
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Pointer cache assisted prefetching
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Vacuum packing: extracting hardware-detected program phases for post-link optimization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic trace selection using performance monitoring hardware sampling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using SimPoint for accurate and efficient simulation
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Just Say No: Benefits of Early Cache Miss Determination
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Catching Accurate Profiles in Hardware
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Temperature-aware microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
Comparing Program Phase Detection Techniques
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Quantifying contention and balancing memory load on hardware DSM multiprocessors
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Hybrid Architectural Dynamic Thermal Management
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Predictable performance in SMT processors
Proceedings of the 1st conference on Computing frontiers
Temperature-aware microarchitecture: Modeling and implementation
ACM Transactions on Architecture and Code Optimization (TACO)
Circuit-aware architectural simulation
Proceedings of the 41st annual Design Automation Conference
Scaling the issue window with look-ahead latency prediction
Proceedings of the 18th annual international conference on Supercomputing
A Content Aware Integer Register File Organization
Proceedings of the 31st annual international symposium on Computer architecture
Efficient simulation of trace samples on parallel machines
Parallel Computing
A low-complexity fetch architecture for high-performance superscalar processors
ACM Transactions on Architecture and Code Optimization (TACO)
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
The Fuzzy Correlation between Code and Performance Predictability
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamically Controlled Resource Allocation in SMT Processors
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Toward kilo-instruction processors
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the international symposium on Code generation and optimization
Increasing Register File Immunity to Transient Errors
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Effective Instruction Prefetching via Fetch Prestaging
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Automatic Construction and Evaluation of Performance Skeletons
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Toward an Evaluation Infrastructure for Power and Energy Optimizations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
How to use SimPoint to pick simulation points
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Accelerated warmup for sampled microarchitecture simulation
ACM Transactions on Architecture and Code Optimization (TACO)
Replicating memory behavior for performance prediction
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Visualization and analysis of phased behavior in Java programs
Proceedings of the 3rd international symposium on Principles and practice of programming in Java
A multinomial clustering model for fast simulation of computer architecture designs
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Dynamic phase analysis for cycle-close trace generation
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Tornado warning: the perils of selective replay in multithreaded processors
Proceedings of the 19th annual international conference on Supercomputing
An asymmetric clustered processor based on value content
Proceedings of the 19th annual international conference on Supercomputing
An Event-Driven Multithreaded Dynamic Optimization Framework
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Reducing the Energy of Speculative Instruction Schedulers
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Thermal Management of On-Chip Caches Through Power Density Minimization
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Optimal sample length for efficient cache simulation
Journal of Systems Architecture: the EUROMICRO Journal
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
An Instruction Fetch Policy Handling L2 Cache Misses in SMT Processors
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework
Proceedings of the International Symposium on Code Generation and Optimization
Online Phase Detection Algorithms
Proceedings of the International Symposium on Code Generation and Optimization
Region Monitoring for Local Phase Detection in Dynamic Optimization Systems
Proceedings of the International Symposium on Code Generation and Optimization
Selecting Software Phase Markers with Code Structure Analysis
Proceedings of the International Symposium on Code Generation and Optimization
Kilo-instruction processors, runahead and prefetching
Proceedings of the 3rd conference on Computing frontiers
Fast thermal simulation for architecture level dynamic thermal management
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Phase-based visualization and analysis of Java programs
Science of Computer Programming - Special issue: Principles and practices of programming in Java (PPPJ 2004)
Efficient remote profiling for resource-constrained devices
ACM Transactions on Architecture and Code Optimization (TACO)
Decomposing memory performance: data structures and phases
Proceedings of the 5th international symposium on Memory management
Learning-Based SMT Processor Resource Distribution via Hill-Climbing
Proceedings of the 33rd annual international symposium on Computer Architecture
Measuring Benchmark Similarity Using Inherent Program Characteristics
IEEE Transactions on Computers
Systematic temperature sensor allocation and placement for microprocessors
Proceedings of the 43rd annual Design Automation Conference
Complexity-based program phase analysis and classification
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Branch predictor guided instruction decoding
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Predictable Performance in SMT Processors: Synergy between the OS and SMTs
IEEE Transactions on Computers
A simple speculative load control mechanism for energy saving
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
B2Sim:: a fast micro-architecture simulator based on basic block characterization
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Yet shorter warmup by combining no-state-loss and MRRL for sampled LRU cache simulation
Journal of Systems and Software - Special issue: Quality software
Evaluating trace cache energy efficiency
ACM Transactions on Architecture and Code Optimization (TACO)
Yield-Aware Cache Architectures
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic Standby Prediction for Leakage Tolerant Microprocessor Functional Units
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
Power signal processing: a new perspective for power analysis and optimization
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Memory scheduling for modern microprocessors
ACM Transactions on Computer Systems (TOCS)
Thermal management of on-chip caches through power density minimization
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy saving through a simple load control mechanism
ACM SIGARCH Computer Architecture News
Performance prediction with skeletons
Cluster Computing
Phase-based adaptive recompilation in a JVM
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
International Journal of High Performance Computing and Networking
A latency-conscious SMT branch prediction architecture
International Journal of High Performance Computing and Networking
Optimising long-latency-load-aware fetch policies for SMT processors
International Journal of High Performance Computing and Networking
Variable latency caches for nanoscale processor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy
Proceedings of the 18th ACM Great Lakes symposium on VLSI
FEKIS: a fast architecture-level thermal analyzer for online thermal regulation
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Server-based data push architecture for multi-processor environments
Journal of Computer Science and Technology
Counting Dependence Predictors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Automated hardware-independent scenario identification
Proceedings of the 45th annual Design Automation Conference
Journal of Systems Architecture: the EUROMICRO Journal
Thermal monitoring mechanisms for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Online Phase-Adaptive Data Layout Selection
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Capturing and optimizing the interactions between prefetching and cache line turnoff
Microprocessors & Microsystems
Analysing and improving clustering based sampling for microprocessor simulation
International Journal of High Performance Computing and Networking
Hill-climbing SMT processor resource distribution
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
An evaluation of the TRIPS computer system
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A distributed processor state management architecture for large-window processors
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Strategies for mapping dataflow blocks to distributed hardware
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Combining Edge Vector and Event Counter for Time-Dependent Power Behavior Characterization
Transactions on High-Performance Embedded Architectures and Compilers II
Power-Aware Bus Coscheduling for Periodic Realtime Applications Running on Multiprocessor SoC
Transactions on High-Performance Embedded Architectures and Compilers II
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
An effort prediction framework for software defect correction
Information and Software Technology
Quantifying hardware counter sampling error in computer system workload characterization
Quantifying hardware counter sampling error in computer system workload characterization
Accurately evaluating application performance in simulated hybrid multi-tasking systems
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Efficient program power behavior characterization
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
A self-adjusting code cache manager to balance start-up time and memory usage
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Dynamic register-renaming scheme for reducing power-density and temperature
Proceedings of the 2010 ACM Symposium on Applied Computing
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
LPA: a first approach to the loop processor architecture
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Using dynamic binary instrumentation to generate multi-platform SimPoints: methodology and accuracy
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Phase complexity surfaces: characterizing time-varying program behavior
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Adaptive simulation sampling using an autoregressive framework
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Modeling soft errors for data caches and alleviating their effects on data reliability
Microprocessors & Microsystems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Register-relocation: a thermal-aware renaming method for reducing temperature of a register file
ACM SIGAPP Applied Computing Review
Vision for liquid architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Online strategies for high-performance power-aware thread execution on emerging multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Detecting phases in parallel applications on shared memory architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Adaps - A three-phase adaptive prediction system for the run-time of jobs based on user behaviour
Journal of Computer and System Sciences
Combating Aging with the Colt Duty Cycle Equalizer
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting dynamic micro-architecture usage in gate sizing
Microprocessors & Microsystems
CRIB: consolidated rename, issue, and bypass
Proceedings of the 38th annual international symposium on Computer architecture
Low cost working set size tracking
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Enhancing ICOUNT2.8 fetch policy with better fairness for SMT processors
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
A fetch policy maximizing throughput and fairness for two-context SMT processors
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Enhancing DCache warn fetch policy for SMT processors
ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
CRAM: coded registers for amplified multiporting
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Characterizing time-varying program behavior using phase complexity surfaces
Transactions on High-Performance Embedded Architectures and Compilers IV
Finding extreme behaviors in microprocessor workloads
Transactions on High-Performance Embedded Architectures and Compilers IV
Providing fairness on shared-memory multiprocessors via process scheduling
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Phase guided profiling for fast cache modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Distributed replay protocol for distributed uniprocessors
Proceedings of the 26th ACM international conference on Supercomputing
CRQ-based fair scheduling on composable multicore architectures
Proceedings of the 26th ACM international conference on Supercomputing
Designing for dark silicon: a methodological perspective on energy efficient systems
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Fair CPU time accounting in CMP+SMT processors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Coordinating prefetching and STT-RAM based last-level cache management for multicore systems
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Efficiently tolerating timing violations in pipelined microprocessors
Proceedings of the 50th Annual Design Automation Conference
Hardware/software approaches for reducing the process variation impact on instruction fetches
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Application-driven end-to-end traffic predictions for low power NoC design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Architecturally homogeneous power-performance heterogeneous multicore systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Task sampling: computer architecture simulation in the many-core era
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Synchronization identification through on-the-fly test
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Mantis: automatic performance prediction for smartphone applications
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Fine-grained Benchmark Subsetting for System Selection
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hardware support for accurate per-task energy metering in multicore systems
ACM Transactions on Architecture and Code Optimization (TACO)
Selecting representative benchmark inputs for exploring microprocessor design spaces
ACM Transactions on Architecture and Code Optimization (TACO)
PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling
ACM Transactions on Architecture and Code Optimization (TACO)
JIT technology with C/C++: Feedback-directed dynamic recompilation for statically compiled languages
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
Abstract: Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To overcome this problem researchers choose a very small portion of a program's execution to evaluate their results, rather than simulating the entire program. In this paper we propose Basic Block Distribution Analysis as an automated approach for finding these small portions of the program to simulate that are representative of the entire program's execution. This approach is based upon using profiles of a program's code structure (basic blocks) to uniquely identify different phases of execution in the program. We show that the periodicity of the basic block frequency profile reflects the periodicity of detailed simulation across several different architectural metrics (e.g., IPC, branch miss rate, cache miss rate, value misprediction, address misprediction, and reorder buffer occupancy). Since basic block frequencies can be collected using very fast profiling tools, our approach provides a practical technique for finding the periodicity and simulation points in applications.