Optimal pipelining in supercomputers
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Instruction issue logic for high-performance, interruptable pipelined processors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Characterization of branch and data dependencies on programs for evaluating pipeline performance
IEEE Transactions on Computers
IEEE Transactions on Computers
Theoretical modeling of superscalar processor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
An exploration of instruction fetch requirement in out-of-order superscalar processors
International Journal of Parallel Programming - parallel architectures and compilation techniques, part II
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Modeling Superscalar Processors via Statistical Simulation
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
A Framework for Statistical Modeling of Superscalar Processor Performance
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Efficient performance prediction for modern microprocessors
Efficient performance prediction for modern microprocessors
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Performance analysis through synthetic trace generation
ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
Online performance analysis by statistical sampling of microprocessor performance counters
Proceedings of the 19th annual international conference on Supercomputing
A Case for MLP-Aware Cache Replacement
Proceedings of the 33rd annual international symposium on Computer Architecture
Performance prediction based on inherent program similarity
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A performance counter architecture for computing accurate CPI components
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Accurate and efficient regression modeling for microarchitectural performance and power prediction
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Efficiently exploring architectural design spaces via predictive modeling
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Automatic performance model construction for the fast software exploration of new hardware designs
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Accurate memory data flow modeling in statistical simulation
Proceedings of the 20th annual international conference on Supercomputing
A Predictive Performance Model for Superscalar Processors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
An analysis of the effects of miss clustering on the cost of a cache miss
Proceedings of the 4th international conference on Computing frontiers
Fast compiler optimisation evaluation using code-feature based performance prediction
Proceedings of the 4th international conference on Computing frontiers
Automated design of application specific superscalar processors: an analytical approach
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 2007 workshop on Experimental computer science
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
ecs'07 Experimental computer science on Experimental computer science
Microarchitecture configurations and floorplanning co-optimization
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Efficient architectural design space exploration via predictive modeling
ACM Transactions on Architecture and Code Optimization (TACO)
IEEE Transactions on Computers
Dispersing proprietary applications as benchmarks through code mutation
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Compiler-directed frequency and voltage scaling for a multiple clock domain microarchitecture
Proceedings of the 5th conference on Computing frontiers
Fetch-Criticality Reduction through Control Independence
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Predictive design space exploration using genetically programmed response surfaces
Proceedings of the 45th annual Design Automation Conference
Journal of Systems Architecture: the EUROMICRO Journal
Distributed order scheduling and its application to multi-core dram controllers
Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
A dollar from 15 cents: cross-platform management for internet services
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Exploring and predicting the architecture/optimising compiler co-design space
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Multi-granularity sampling for simulating concurrent heterogeneous applications
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Multi-optimization power management for chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Prediction models for multi-dimensional power-performance optimization on many cores
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Per-thread cycle accounting in SMT processors
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
ACM Transactions on Architecture and Code Optimization (TACO)
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Toward a multicore architecture for real-time ray-tracing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
FlexDCP: a QoS framework for CMP architectures
ACM SIGOPS Operating Systems Review
A mechanistic performance model for superscalar out-of-order processors
ACM Transactions on Computer Systems (TOCS)
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Proceedings of the 36th annual international symposium on Computer architecture
Selecting Computer Architectures by Means of Control-Flow-Graph Mining
IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Dynamic thermal management via architectural adaptation
Proceedings of the 46th Annual Design Automation Conference
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A hybrid local-global approach for multi-core thermal management
Proceedings of the 2009 International Conference on Computer-Aided Design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Software—Practice & Experience
Studying compiler optimizations on superscalar processors through interval analysis
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
MLP-aware dynamic cache partitioning
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Where replacement algorithms fail: a thorough analysis
Proceedings of the 7th ACM international conference on Computing frontiers
Interval-based models for run-time DVFS orchestration in superscalar processors
Proceedings of the 7th ACM international conference on Computing frontiers
Architecture performance prediction using evolutionary artificial neural networks
Evo'08 Proceedings of the 2008 conference on Applications of evolutionary computing
Rapid early-stage microarchitecture design using predictive models
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis
Proceedings of the 37th annual international symposium on Computer architecture
Criticality-driven superscalar design space exploration
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Rapid runtime estimation methods for pipelined MPSoCs
Proceedings of the Conference on Design, Automation and Test in Europe
Comparing scalability prediction strategies on an SMP of CMPs
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Fast modeling of shared caches in multicore systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Fine-grained DVFS using on-chip regulators
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic cache partitioning based on the MLP of cache misses
Transactions on high-performance embedded architectures and compilers III
The shape of the processor design space and its implications for early stage explorations
ACMOS'05 Proceedings of the 7th WSEAS international conference on Automatic control, modeling and simulation
Modeling program resource demand using inherent program characteristics
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Predictive coordination of multiple on-chip resources for chip multiprocessors
Proceedings of the international conference on Supercomputing
Modeling program resource demand using inherent program characteristics
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
ACM Transactions on Architecture and Code Optimization (TACO)
Kismet: parallel speedup estimates for serial programs
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Dynamic co-allocation of level one caches
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
An approach to performance prediction for parallel applications
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Micro-architecture performance estimation by formula
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
MLP-Aware instruction queue resizing: the key to power-efficient performance
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
ACM Transactions on Embedded Computing Systems (TECS)
A first-order mechanistic model for architectural vulnerability factor
Proceedings of the 39th Annual International Symposium on Computer Architecture
Predicting memcached throughput using simulation and modeling
Proceedings of the 2012 Symposium on Theory of Modeling and Simulation - DEVS Integrative M&S Symposium
Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors
ACM Transactions on Computer Systems (TOCS)
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs
ACM Transactions on Computer Systems (TOCS)
Simsys: a performance simulation framework
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Accurately modeling superscalar processor performance with reduced trace
Journal of Parallel and Distributed Computing
Predicting Performance Impact of DVFS for Realistic Memory Systems
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the Conference on Design, Automation and Test in Europe
Power-performance modeling on asymmetric multi-cores
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
A proposed performance model for superscalar processorsconsists of 1) a component that models the relationshipbetween instructions issued per cycle and the sizeof the instruction window under ideal conditions, and 2)methods for calculating transient performance penaltiesdue to branch mispredictions, instruction cache misses,and data cache misses.Using trace-derived data dependenceinformation, data and instruction cache miss rates,and branch miss-prediction rates as inputs, the model canarrive at performance estimates for a typical superscalarprocessor that are within 5.8% of detailed simulation onaverage and within 13% in the worst case. The modelalso provides insights into the workings of superscalarprocessors and long-term microarchitecture trends such aspipeline depths and issue widths.