ACM Transactions on Computer Systems (TOCS)
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Theoretical modeling of superscalar processor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
Olden: parallelizing programs with dynamic data structures on distributed-memory machines
An Analytical Model for Designing Memory Hierarchies
IEEE Transactions on Computers
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
An exploration of instruction fetch requirement in out-of-order superscalar processors
International Journal of Parallel Programming - parallel architectures and compilation techniques, part II
A discussion on non-blocking/lockup-free caches
ACM SIGARCH Computer Architecture News
Benchmark health considered harmful
ACM SIGARCH Computer Architecture News
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Asim: A Performance Model Framework
Computer
Microarchitectural exploration with Liberty
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
A Framework for Statistical Modeling of Superscalar Processor Performance
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Efficient performance prediction for modern microprocessors
Efficient performance prediction for modern microprocessors
A First-Order Superscalar Processor Model
Proceedings of the 31st annual international symposium on Computer architecture
Toward kilo-instruction processors
ACM Transactions on Architecture and Code Optimization (TACO)
A performance counter architecture for computing accurate CPI components
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Scalable Cache Miss Handling for High Memory-Level Parallelism
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Automated design of application-specific superscalar processors
Automated design of application-specific superscalar processors
Automated design of application specific superscalar processors: an analytical approach
Proceedings of the 34th annual international symposium on Computer architecture
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Storage hierarchy optimization procedure
IBM Journal of Research and Development
On optimization of storage hierarchies
IBM Journal of Research and Development
Hi-index | 0.00 |
This article proposes techniques to predict the performance impact of pending cache hits, hardware prefetching, and miss status holding register resources on superscalar microprocessors using hybrid analytical models. The proposed models focus on timeliness of pending hits and prefetches and account for a limited number of MSHRs. They improve modeling accuracy of pending hits by 3.9× and when modeling data prefetching, a limited number of MSHRs, or both, these techniques result in average errors of 9.5% to 17.8%. The impact of non-uniform DRAM memory latency is shown to be approximated well by using a moving average of memory access latency.