Analytic Evaluation of Shared-Memory Architectures

Authors:
Daniel J. Sorin;Jonathan L. Lemon;Derek L. Eager;Mary K. Vernon
Affiliations:
-;-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2003

Citing 17
Cited 5

Quantitative system performance: computer system analysis using queueing network models

Quantitative system performance: computer system analysis using queueing network models
An analytical cache model

ACM Transactions on Computer Systems (TOCS)
An analytic model of multistage interconnection networks

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
An evaluation of memory consistency models for shared-memory systems with ILP processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Operating system support for improving data locality on CC-NUMA compute servers

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors

Proceedings of the 25th annual international symposium on Computer architecture
Analytic evaluation of shared-memory systems with ILP processors

Proceedings of the 25th annual international symposium on Computer architecture
AMVA techniques for high service time variability

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
HLS: combining statistical and symbolic simulation to guide microprocessor designs

Proceedings of the 27th annual international symposium on Computer architecture
Complete Computer System Simulation: The SimOS Approach

IEEE Parallel & Distributed Technology: Systems & Technology
POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems

IEEE Transactions on Software Engineering
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture

Correlation between Detailed and Simplified Simulations in Studying Multiprocessor Architecture

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Performance impacts of autocorrelated flows in multi-tiered systems

Performance Evaluation
Active memory controller

The Journal of Supercomputing
Analysis and modeling of service impacts on system activities, resource workloads and service performance on computer and network systems

Information-Knowledge-Systems Management
System impact characteristics of cyber services, security mechanisms, and attacks with implications in cyber system survivability

Information-Knowledge-Systems Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper develops and validates an efficient analytical model for evaluating the performance of shared memory architectures with ILP processors. First, we instrument the SimOS simulator to measure the parameters for such a model and we find a surprisingly high degree of processor memory request heterogeneity in the workloads. Examining the model parameters provides insight into application behaviors and how they interact with the system. Second, we create a model that captures such heterogeneous processor behavior, which is important for analyzing memory system design tradeoffs. Highly bursty memory request traffic and lock contention are also modeled in a significantly more robust way than in previous work. With these features, the model is applicable to a wide range of architectures and applications. Although the features increase the model complexity, it is a useful design tool because the size of the model input parameter set remains manageable, and the model is still several orders of magnitude quicker to solve than detailed simulation. Validation results show that the model is highly accurate, producing heterogeneous per processor throughputs that are generally within 5 percent and, for the workloads validated, always within 13 percent of the values measured by detailed simulation with SimOS. Several examples illustrate applications of the model to studying architectural design issues and the interactions between the architecture and the application workloads.