Quantitative system performance: computer system analysis using queueing network models
Quantitative system performance: computer system analysis using queueing network models
Simulating computer systems: techniques and tools
Simulating computer systems: techniques and tools
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Dynamic dependency analysis of ordinary programs
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Theoretical modeling of superscalar processor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Z-iteration: a simple method for throughput estimation in time-dependent multi-class systems
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Instruction Window Size Trade-Offs and Characterization of Program Parallelism
IEEE Transactions on Computers
An integrated performance and power model for superscalar processor designs
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Co-optimization of performance and power in a superscalar processor design
EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
Performance analysis of multi-threaded multi-core CPUs
Proceedings of the First International Workshop on Many-core Embedded Systems
Hi-index | 0.00 |
Superscalar processors obtain their performance by exploiting instruction level parallelism in programs. Their performance is therefore limited by characteristics of programs and the design of the processor. Due to the complexity involved, estimating the performance of any superscalar processor design is a difficult task. Quick prediction of performance improvement arising from architecture modifications is even more difficult. In this paper, a model of superscalar processors using a network of Multiple Class and Multiple Resource Queues is described and studied. In this model, we are able to model and study instruction classes, instruction dependencies, the cache, the branch unit, the decoder unit, the central instruction buffer, the functional units, the retirement buffer, the retirement unit and instruction issue policy in an integrated manner. This model has been verified against measured performance and has shown an average error of 5%. From this starting point, we applied sensitivity analysis on the model and studied qualitatively three important classes of improvements one can make to a superscalar processor's design. The insights we derived show how a good model can be used to accurate pinpoint bottlenecks and assign relative importance to them. This will in turn guide development efforts.