Analytic evaluation of shared-memory systems with ILP processors

Authors:
Daniel J. Sorin;Vijay S. Pai;Sarita V. Adve;Mary K. Vernon;David A. Wood
Affiliations:
Computer Sciences Dept, University of Wisconsin-Madison;Dept of Electrical & Computer Engineering, Rice University;Dept of Electrical & Computer Engineering, Rice University;Computer Sciences Dept, University of Wisconsin-Madison;Computer Sciences Dept, University of Wisconsin-Madison
Venue:
Proceedings of the 25th annual international symposium on Computer architecture
Year:
1998

Citing 22
Cited 34

Quantitative system performance: computer system analysis using queueing network models

Quantitative system performance: computer system analysis using queueing network models
An accurate and efficient performance analysis technique for multiprocessor snooping cache-consistency protocols

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
An analytic model of multistage interconnection networks

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Comparison of hardware and software cache coherence schemes

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Evaluating Design Choices for Shared Bus Multiprocessors in a Throughput-Oriented Environment

IEEE Transactions on Computers
The influence of random delays on parallel execution times

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An integrated compilation and performance analysis environment for data parallel programs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Decoupled hardware support for distributed shared memory

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Application and architectural bottlenecks in large scale distributed shared memory machines

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Embra: fast and flexible machine simulation

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Memory-system design considerations for dynamically-scheduled processors

Proceedings of the 24th annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Analyzing queueing networks with simultaneous resource possession

Communications of the ACM
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodology

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture

AMVA techniques for high service time variability

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An analytical model of the working-set sizes in decision-support systems

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Performance prediction for random write reductions: a case study in modeling shared memory programs

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Hardware-Assisted Characterization of NAS Benchmarks

Cluster Computing
Queuing Simulation Model for Multiprocessor Systems

Computer
Analytic Evaluation of Shared-Memory Architectures

IEEE Transactions on Parallel and Distributed Systems
Mean Value Analysis: a Personal Account

Performance Evaluation: Origins and Directions
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

IEEE Transactions on Computers
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies

Proceedings of the 31st annual international symposium on Computer architecture
Comparison of analytic performance models using closed mean-value analysis versus open-queuing theory for estimating cycles per instruction of memory hierarchies

IBM Journal of Research and Development
Comprehensive multiprocessor cache miss rate generation using multivariate models

ACM Transactions on Computer Systems (TOCS)
A methodology for detailed performance modeling of reduction computations on SMP machines

Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Correlation between Detailed and Simplified Simulations in Studying Multiprocessor Architecture

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Store Memory-Level Parallelism Optimizations for Commercial Applications

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A case study in top-down performance estimation for a large-scale parallel application

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Achieving structural and composable modeling of complex systems

International Journal of Parallel Programming - Special issue: The next generation software program
Comprehensive multivariate extrapolation modeling of multiprocessor cache miss rates

ACM Transactions on Computer Systems (TOCS)
Determining output uncertainty of computer system models

Performance Evaluation
An analysis of the effects of miss clustering on the cost of a cache miss

Proceedings of the 4th international conference on Computing frontiers
Pipeline spectroscopy

Proceedings of the 2007 workshop on Experimental computer science
Pipeline spectroscopy

ecs'07 Experimental computer science on Experimental computer science
The case for simple, visible cache coherency

Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
Toward a multicore architecture for real-time ray-tracing

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
Defining relevant distances between server workloads

Performance Evaluation
Queuing theoretic model for a multiprocessor with private caches and shared memory

ACM SIGARCH Computer Architecture News
Accelerating multi-core simulators

Proceedings of the 2010 ACM Symposium on Applied Computing
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Active memory controller

The Journal of Supercomputing
Predicting Performance Impact of DVFS for Realistic Memory Systems

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper develops and validates an analytical model for evaluating various types of architectural alternatives for shared-memory systems with processors that aggressively exploit instruction-level parallelism. Compared to simulation, the analytical model is many orders of magnitude faster to solve, yielding highly accurate system performance estimates in seconds.The model input parameters characterize the ability of an application to exploit instruction-level parallelism as well as the interaction between the application and the memory system architecture. A trace-driven simulation methodology is developed that allows these parameters to be generated over 100 times faster than with a detailed execution-driven simulator.Finally, this paper shows that the analytical model can be used to gain insights into application performance and to evaluate architectural design trade-offs.