Cost/performance of a parallel computer simulator

Authors:
Babak Falsafi;David A. Wood
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI;Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI
Venue:
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Year:
1994

Citing 18
Cited 6

Allocating Independent Subtasks on Parallel Processors

IEEE Transactions on Software Engineering
Distributed discrete-event simulation

ACM Computing Surveys (CSUR)
Footprints in the cache

ACM Transactions on Computer Systems (TOCS)
The rice parallel processing testbed

SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Efficient distributed event-driven simulations of multiple-loop networks

Communications of the ACM
Memory coherence in shared virtual memory systems

ACM Transactions on Computer Systems (TOCS)
Parallel discrete event simulation

Communications of the ACM - Special issue on simulation
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
The network architecture of the Connection Machine CM-5 (extended abstract)

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Cooperative shared memory: software and hardware for scalable multiprocessors

ACM Transactions on Computer Systems (TOCS)
Mechanisms for cooperative shared memory

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Efficient parallel simulation for designing multiprocessor systems

Efficient parallel simulation for designing multiprocessor systems
An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Scaling Parallel Programs for Multiprocessors: Methodology and Examples

Computer
Conservative Parallel Simulation of Priority Class Queuing Networks

IEEE Transactions on Parallel and Distributed Systems
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR

PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
Fast Accurate Simulation of Large Shared Memory Multiprocessors

Fast Accurate Simulation of Large Shared Memory Multiprocessors

Optimistic simulation of parallel architectures using program executables

PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Modeling cost/performance of a parallel computer simulator

ACM Transactions on Modeling and Computer Simulation (TOMACS)
A performance analysis model for distributed simulations

WSC '96 Proceedings of the 28th conference on Winter simulation
Profit-Effective Parallel Computing

IEEE Concurrency
Cost-Effective Parallel Computing

Computer
A model for parallel simulation of distributed shared memory

MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper examines the cost/performance of simulating a hypothetical target parallel computer using a commercial host parallel computer. We address the question of whether parallel simulation is simply faster than sequential simulation, or if it is also more cost-effective. To answer this, we develop a performance model of the Wisconsin Wind Tunnel (WWT), a system that simulates cache-coherent shared-memory machines on a message-passing Thinking Machines CM-5. The performance model uses Kruskal and Weiss's fork-join model to account for the effect of event processing time variability on WWT's conservative fixed-window simulation algorithm. A generalization of Thiebaut and Stone's footprint model accurately predicts the effect of cache interference on the CM-5. The model is calibrated using parameters extracted from a fully-parallel simulation (p=N), and validated by measuring the speedup as the number of processors (p) ranges from one to the number of target nodes (N. Together with simple cost models, the performance model indicates that for target system sizes of 32 nodes and larger, parallel simulation is more cost-effective than sequential simulation. The key intuition behind this result is that large simulations require large memories, which dominate the cost of a uniprocessor; parallel computers allow multiple processors to simultaneously access this large memory.