Quantitative system performance: computer system analysis using queueing network models
Quantitative system performance: computer system analysis using queueing network models
ACM Transactions on Computer Systems (TOCS)
An analytic model of multistage interconnection networks
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
An evaluation of memory consistency models for shared-memory systems with ILP processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
AMVA techniques for high service time variability
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
HLS: combining statistical and symbolic simulation to guide microprocessor designs
Proceedings of the 27th annual international symposium on Computer architecture
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems
IEEE Transactions on Software Engineering
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Correlation between Detailed and Simplified Simulations in Studying Multiprocessor Architecture
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Performance impacts of autocorrelated flows in multi-tiered systems
Performance Evaluation
The Journal of Supercomputing
Information-Knowledge-Systems Management
Information-Knowledge-Systems Management
Hi-index | 0.00 |
This paper develops and validates an efficient analytical model for evaluating the performance of shared memory architectures with ILP processors. First, we instrument the SimOS simulator to measure the parameters for such a model and we find a surprisingly high degree of processor memory request heterogeneity in the workloads. Examining the model parameters provides insight into application behaviors and how they interact with the system. Second, we create a model that captures such heterogeneous processor behavior, which is important for analyzing memory system design tradeoffs. Highly bursty memory request traffic and lock contention are also modeled in a significantly more robust way than in previous work. With these features, the model is applicable to a wide range of architectures and applications. Although the features increase the model complexity, it is a useful design tool because the size of the model input parameter set remains manageable, and the model is still several orders of magnitude quicker to solve than detailed simulation. Validation results show that the model is highly accurate, producing heterogeneous per processor throughputs that are generally within 5 percent and, for the workloads validated, always within 13 percent of the values measured by detailed simulation with SimOS. Several examples illustrate applications of the model to studying architectural design issues and the interactions between the architecture and the application workloads.