Instruction issue logic for high-performance, interruptable pipelined processors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems
IEEE Transactions on Computers
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A model for estimating trace-sample miss ratios
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Improving the accuracy of dynamic branch prediction using branch correlation
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Predicting conditional branch directions from previous runs of a program
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A comparison of dynamic branch predictors that use two levels of branch history
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Effectiveness of trace sampling for performance debugging tools
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Reducing indirect function call overhead in C++ programs
POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fast and accurate instruction fetch and branch prediction
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The effect of speculatively updating branch history on branch prediction accuracy, revisited
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Improving the accuracy of static branch prediction using branch correlation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A comparative analysis of schemes for correlated branch prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Alternative implementations of hybrid branch predictors
Proceedings of the 28th annual international symposium on Microarchitecture
Correlation and aliasing in dynamic branch predictors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Analysis techniques for predicated code
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Compiler synthesized dynamic branch prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Wrong-path instruction prefetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Designing high bandwidth on-chip caches
Proceedings of the 24th annual international symposium on Computer architecture
Memory-system design considerations for dynamically-scheduled processors
Proceedings of the 24th annual international symposium on Computer architecture
Data prefetching on the HP PA-8000
Proceedings of the 24th annual international symposium on Computer architecture
Target prediction for indirect jumps
Proceedings of the 24th annual international symposium on Computer architecture
The agree predictor: a mechanism for reducing negative branch history interference
Proceedings of the 24th annual international symposium on Computer architecture
Trading conflict and capacity aliasing in conditional branch predictors
Proceedings of the 24th annual international symposium on Computer architecture
A language for describing predictors and its application to automatic synthesis
Proceedings of the 24th annual international symposium on Computer architecture
Run-time adaptive cache hierarchy management via reference analysis
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Path-based next trace prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Multipath execution: opportunities and limits
ICS '98 Proceedings of the 12th international conference on Supercomputing
Kin: a high performance asynchronous processor architecture
ICS '98 Proceedings of the 12th international conference on Supercomputing
An analysis of correlation and predictability: what makes two-level branch predictors work
Proceedings of the 25th annual international symposium on Computer architecture
Accurate indirect branch prediction
Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution
Proceedings of the 25th annual international symposium on Computer architecture
Selective eager execution on the PolyPath architecture
Proceedings of the 25th annual international symposium on Computer architecture
International Journal of Parallel Programming
The YAGS branch prediction scheme
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Load latency tolerance in dynamically scheduled processors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improving prediction for procedure returns with return-address-stack repair mechanisms
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Design Issues and Tradeoffs for Write Buffers
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
ACM Transactions on Computer Systems (TOCS)
A time-stamping algorithm for efficient performance estimation of superscalar processors
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Workload characterization of emerging computer applications
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A microprocessor survey course for learning advanced computer architecture
SIGCSE '02 Proceedings of the 33rd SIGCSE technical symposium on Computer science education
Parallel simulation of chip-multiprocessor architectures
ACM Transactions on Modeling and Computer Simulation (TOMACS)
The performance advantage of applying compression to the memory system
Proceedings of the 2002 workshop on Memory system performance
A Statistically Rigorous Approach for Improving Simulation Methodology
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Multiple-path execution for chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Evaluation and choice of various branch predictors for low-power embedded processor
Journal of Computer Science and Technology
Efficient simulation of trace samples on parallel machines
Parallel Computing
Alloyed branch history: combining global and local branch history for robust performance
International Journal of Parallel Programming
Accelerated warmup for sampled microarchitecture simulation
ACM Transactions on Architecture and Code Optimization (TACO)
Optimal sample length for efficient cache simulation
Journal of Systems Architecture: the EUROMICRO Journal
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
Yet shorter warmup by combining no-state-loss and MRRL for sampled LRU cache simulation
Journal of Systems and Software - Special issue: Quality software
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Applying Statistical Sampling for Fast and Efficient Simulation of Commercial Workloads
IEEE Transactions on Computers
International Journal of High Performance Computing and Networking
Superscalar architecture design for high performance DSP operations
Microprocessors & Microsystems
Finding extreme behaviors in microprocessor workloads
Transactions on High-Performance Embedded Architectures and Compilers IV
Design space exploration of hybrid ultra low power branch predictors
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Hi-index | 14.99 |
Design parameters interact in complex ways in modern processors, especially because out-of-order issue and decoupling buffers allow latencies to be overlapped. Trade-offs among instruction-window size, branch-prediction accuracy, and instruction- and data-cache size can change as these parameters move through different domains. For example, modeling unrealistic caches can under- or overstate the benefits of better prediction or a larger instruction window. Avoiding such pitfalls requires understanding how all these parameters interact. Because such methodological mistakes are common, this paper provides a comprehensive set of SimpleScalar simulation results from SPECint95 programs, showing the interactions among these major structures. In addition to presenting this database of simulation results, major mechanisms driving the observed trade-offs are described. The paper also considers appropriate simulation techniques when sampling full-length runs with the SPEC reference inputs. In particular, the results show that branch mispredictions limit the benefits of larger instruction windows, that better branch prediction and better instruction cache behavior have synergistic effects, and that the benefits of larger instruction windows and larger data caches trade off and have overlapping effects. In addition, simulations of only 50 million instructions can yield representative results if these short windows are carefully selected.