A First-Order Superscalar Processor Model

Authors:
Tejas S. Karkhanis;James E. Smith
Affiliations:
Univ. of Wisconsin - Madison;Univ. of Wisconsin - Madison
Venue:
Proceedings of the 31st annual international symposium on Computer architecture
Year:
2004

Citing 16
Cited 80

Optimal pipelining in supercomputers

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Instruction issue logic for high-performance, interruptable pipelined processors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Characterization of branch and data dependencies on programs for evaluating pipeline performance

IEEE Transactions on Computers
The Nonuniform Distribution of Instruction-Level and Machine Parallelism and its Effect on Performance

IEEE Transactions on Computers
Theoretical modeling of superscalar processor performance

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Analytic evaluation of shared-memory systems with ILP processors

Proceedings of the 25th annual international symposium on Computer architecture
An exploration of instruction fetch requirement in out-of-order superscalar processors

International Journal of Parallel Programming - parallel architectures and compilation techniques, part II
The optimum pipeline depth for a microprocessor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Modeling Superscalar Processors via Statistical Simulation

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
A Framework for Statistical Modeling of Superscalar Processor Performance

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Efficient performance prediction for modern microprocessors

Efficient performance prediction for modern microprocessors
Using Interaction Costs for Microarchitectural Bottleneck Analysis

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Performance analysis through synthetic trace generation

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software

Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
Online performance analysis by statistical sampling of microprocessor performance counters

Proceedings of the 19th annual international conference on Supercomputing
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
Performance prediction based on inherent program similarity

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A performance counter architecture for computing accurate CPI components

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Accurate and efficient regression modeling for microarchitectural performance and power prediction

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Efficiently exploring architectural design spaces via predictive modeling

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Automatic performance model construction for the fast software exploration of new hardware designs

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Accurate memory data flow modeling in statistical simulation

Proceedings of the 20th annual international conference on Supercomputing
The Future of Simulation: A Field of Dreams

Computer
A Predictive Performance Model for Superscalar Processors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
An analysis of the effects of miss clustering on the cost of a cache miss

Proceedings of the 4th international conference on Computing frontiers
Fast compiler optimisation evaluation using code-feature based performance prediction

Proceedings of the 4th international conference on Computing frontiers
Automated design of application specific superscalar processors: an analytical approach

Proceedings of the 34th annual international symposium on Computer architecture
A Top-Down Approach to Architecting CPI Component Performance Counters

IEEE Micro
Pipeline spectroscopy

Proceedings of the 2007 workshop on Experimental computer science
Impact of die-to-die and within-die parameter variations on the throughput distribution of multi-core processors

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Pipeline spectroscopy

ecs'07 Experimental computer science on Experimental computer science
Microarchitecture configurations and floorplanning co-optimization

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Efficient architectural design space exploration via predictive modeling

ACM Transactions on Architecture and Code Optimization (TACO)
Memory Data Flow Modeling in Statistical Simulation for the Efficient Exploration of Microprocessor Design Spaces

IEEE Transactions on Computers
Dispersing proprietary applications as benchmarks through code mutation

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Compiler-directed frequency and voltage scaling for a multiple clock domain microarchitecture

Proceedings of the 5th conference on Computing frontiers
Fetch-Criticality Reduction through Control Independence

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Predictive design space exploration using genetically programmed response surfaces

Proceedings of the 45th annual Design Automation Conference
On the effectiveness of phase based regression models to trade power and performance using dynamic processor adaptation

Journal of Systems Architecture: the EUROMICRO Journal
Distributed order scheduling and its application to multi-core dram controllers

Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
A dollar from 15 cents: cross-platform management for internet services

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Exploring and predicting the architecture/optimising compiler co-design space

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Multi-granularity sampling for simulating concurrent heterogeneous applications

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Multi-optimization power management for chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Prediction models for multi-dimensional power-performance optimization on many cores

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Per-thread cycle accounting in SMT processors

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Memory-level parallelism aware fetch policies for simultaneous multithreading processors

ACM Transactions on Architecture and Code Optimization (TACO)
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Toward a multicore architecture for real-time ray-tracing

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
FlexDCP: a QoS framework for CMP architectures

ACM SIGOPS Operating Systems Review
A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
Selecting Computer Architectures by Means of Control-Flow-Graph Mining

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Dynamic thermal management via architectural adaptation

Proceedings of the 46th Annual Design Automation Conference
Portable compiler optimisation across embedded programs and microarchitectures using machine learning

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A hybrid local-global approach for multi-core thermal management

Proceedings of the 2009 International Conference on Computer-Aided Design
Impact of die-to-die and within-die parameter variations on the clock frequency and throughput of multi-core processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Two-phase trace-driven simulation (TPTS): a fast multicore processor architecture simulation approach

Software—Practice & Experience
Studying compiler optimizations on superscalar processors through interval analysis

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
MLP-aware dynamic cache partitioning

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Where replacement algorithms fail: a thorough analysis

Proceedings of the 7th ACM international conference on Computing frontiers
Interval-based models for run-time DVFS orchestration in superscalar processors

Proceedings of the 7th ACM international conference on Computing frontiers
Architecture performance prediction using evolutionary artificial neural networks

Evo'08 Proceedings of the 2008 conference on Applications of evolutionary computing
Rapid early-stage microarchitecture design using predictive models

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis

Proceedings of the 37th annual international symposium on Computer architecture
Criticality-driven superscalar design space exploration

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Rapid runtime estimation methods for pipelined MPSoCs

Proceedings of the Conference on Design, Automation and Test in Europe
Comparing scalability prediction strategies on an SMP of CMPs

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Fast modeling of shared caches in multicore systems

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Fine-grained DVFS using on-chip regulators

ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic cache partitioning based on the MLP of cache misses

Transactions on high-performance embedded architectures and compilers III
The shape of the processor design space and its implications for early stage explorations

ACMOS'05 Proceedings of the 7th WSEAS international conference on Automatic control, modeling and simulation
Modeling program resource demand using inherent program characteristics

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Predictive coordination of multiple on-chip resources for chip multiprocessors

Proceedings of the international conference on Supercomputing
Modeling program resource demand using inherent program characteristics

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs

ACM Transactions on Architecture and Code Optimization (TACO)
Kismet: parallel speedup estimates for serial programs

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Dynamic co-allocation of level one caches

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
An approach to performance prediction for parallel applications

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Clearing the clouds: a study of emerging scale-out workloads on modern hardware

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Micro-architecture performance estimation by formula

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
MLP-Aware instruction queue resizing: the key to power-efficient performance

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Exploring and Predicting the Effects of Microarchitectural Parameters and Compiler Optimizations on Performance and Energy

ACM Transactions on Embedded Computing Systems (TECS)
A first-order mechanistic model for architectural vulnerability factor

Proceedings of the 39th Annual International Symposium on Computer Architecture
Predicting memcached throughput using simulation and modeling

Proceedings of the 2012 Symposium on Theory of Modeling and Simulation - DEVS Integrative M&S Symposium
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors

ACM Transactions on Computer Systems (TOCS)
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs

ACM Transactions on Computer Systems (TOCS)
Simsys: a performance simulation framework

Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Accurately modeling superscalar processor performance with reduced trace

Journal of Parallel and Distributed Computing
Predicting Performance Impact of DVFS for Realistic Memory Systems

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A meta-model assisted coprocessor synthesis framework for compiler/architecture parameters customization

Proceedings of the Conference on Design, Automation and Test in Europe
Power-performance modeling on asymmetric multi-cores

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A proposed performance model for superscalar processorsconsists of 1) a component that models the relationshipbetween instructions issued per cycle and the sizeof the instruction window under ideal conditions, and 2)methods for calculating transient performance penaltiesdue to branch mispredictions, instruction cache misses,and data cache misses.Using trace-derived data dependenceinformation, data and instruction cache miss rates,and branch miss-prediction rates as inputs, the model canarrive at performance estimates for a typical superscalarprocessor that are within 5.8% of detailed simulation onaverage and within 13% in the worst case. The modelalso provides insights into the workings of superscalarprocessors and long-term microarchitecture trends such aspipeline depths and issue widths.