The rice parallel processing testbed
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Parallel depth first search. Part II. analysis
International Journal of Parallel Programming
Measuring parallel processor performance
Communications of the ACM
Performance Analysis of k-ary n-cube Interconnection Networks
IEEE Transactions on Computers
Scalability of parallel machines
Communications of the ACM
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
The Stanford Dash Multiprocessor
Computer
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Architectural requirements of parallel scientific applications with explicit communication
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Modeling communication in parallel algorithms: a fruitful interaction between theory and systems?
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
An approach to scalability study of shared memory parallel systems
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A simulation-based scalability study of parallel systems
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
The POWER2 performance monitor
IBM Journal of Research and Development
On characterizing bandwidth requirements of parallel applications
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Architectural mechanisms for explicit communication in shared memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A framework for evaluating architectural issues of parallel systems
A framework for evaluating architectural issues of parallel systems
ICS '90 Proceedings of the 4th international conference on Supercomputing
Toward a More Realistic Performance Evaluation of Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
Performance benefits of virtual channels and adaptive routing: an application-driven study
ICS '97 Proceedings of the 11th international conference on Supercomputing
SAS-Iml-DL-I User's Guide, 1984
SAS-Iml-DL-I User's Guide, 1984
Parallel performance prediction using lost cycles analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
Visualizing the Performance of Parallel Programs
IEEE Software
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
The Scalability of FFT on Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
The Impact of Pipelined Channels on k-ary n-Cube Networks
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Mesh Interconnection Networks with Deterministic Routing
IEEE Transactions on Parallel and Distributed Systems
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
A large scale, homogeneous, fully distributed parallel machine, I
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Abstracting network characteristics and locality properties of parallel systems
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
The complexity of parallel computations
The complexity of parallel computations
Hi-index | 0.00 |
Evaluating and analyzing the performance of a parallel application on an architecture to explain the disparity between projected and delivered performance is an important aspect of parallel systems research. However, conducting such a study is hard due to the vast design space of these systems. In this paper, we study two important aspects related to the performance of parallel applications on shared memory parallel architectures. Fist, we quantify overheads observed during the execution of these applications on three different simulated architectures. We next use these results to synthesize the bandwidth requirements for the applications with respect to different network topologies. This study is performed using an execution-driven simulation tool called SPASM, which provides a way of isolating and quantifying the different parallel system overheads in a nonintrusive manner. The first exercise shows that in shared memory machines with private caches, as long as the applications are well-structured to exploit locality, the key determinant that impacts performance is network connection. The second exercise quantifies the network bandwidth needed to minimize the effect of network connection. Specifically, it is shown that for the applications considered, as long as the problem sizes are increased commensurate with the system size, current network technologies supporting 200-300 MBytes/sec link bandwidth are sufficient to keep the network overheads (such as the latency and contention) within acceptable bounds.