The rice parallel processing testbed
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Parallel depth first search. Part II. analysis
International Journal of Parallel Programming
The effect of sharing on the cache and bus performance of parallel programs
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Measuring parallel processor performance
Communications of the ACM
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Architectural requirements of parallel scientific applications with explicit communication
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The impact of synchronization and granularity on parallel systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
PROTEUS: A HIGH-PERFORMANCE PARALLEL-ARCHITECTURE SIMULATOR
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
The complexity of parallel computations
The complexity of parallel computations
Paper: Toward a better parallel performance metric
Parallel Computing
Future applicability of bus-based shared memory multiprocessors
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
On characterizing bandwidth requirements of parallel applications
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A semi-empirical approach to scalability study
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Predicting application behavior in large scale shared-memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Architectural mechanisms for explicit communication in shared memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Multiprocessor scalability predictions through detailed program execution analysis
ICS '95 Proceedings of the 9th international conference on Supercomputing
Toward a More Realistic Performance Evaluation of Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
pSNOW: a tool to evaluate architectural issues for NOW environments
ICS '97 Proceedings of the 11th international conference on Supercomputing
Performance benefits of virtual channels and adaptive routing: an application-driven study
ICS '97 Proceedings of the 11th international conference on Supercomputing
Execution-driven simulators for parallel systems design
Proceedings of the 29th conference on Winter simulation
An Application-Driven Study of Parallel System Overheads and Network Bandwidth Requirements
IEEE Transactions on Parallel and Distributed Systems
A Testbed for Evaluation of Fault-Tolerant Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
Impact of Virtual Channels and Adaptive Routing on Application Performance
IEEE Transactions on Parallel and Distributed Systems
Communication in Parallel Applications: Characterization and Sensitivity Analysis
ICPP '97 Proceedings of the international Conference on Parallel Processing
Abstracting network characteristics and locality properties of parallel systems
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Robust scalability analysis and SPM case studies
The Journal of Supercomputing
Extracting and predicting the communication behaviour of parallel applications
International Journal of Parallel, Emergent and Distributed Systems
Model for simulation of heterogeneous high-performance computing environments
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Detecting phases in parallel applications on shared memory architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A new memory slowdown model for the characterization of computing systems
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Hi-index | 0.00 |
The overheads in a parallel system that limit its scalability need to be identified and separated in order to enable parallel algorithm design and the development of parallel machines. Such overheads may be broadly classified into two components. The first one is intrinsic to the algorithm and arises due to factors such as the work-imbalance and the serial fraction. The second one is due to the interaction between the algorithm and the architecture and arises due to latency and contention in the network. A top-down approach to scalability study of shared memory parallel systems is proposed in this research. We define the notion of overhead functions associated with the different algorithmic and architectural characteristics to quantify the scalability of parallel systems; we isolate the algorithmic overhead and the overheads due to network latency and contention from the overall execution time of an application; we design and implement an execution-driven simulation platform that incorporates these methods for quantifying the overhead functions; and we use this simulator to study the scalability characteristics of five applications on shared memory platforms with different communication topologies.