Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A Transformation Approach to Derive Efficient Parallel Implementations
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Performance modelling for task-parallel programs
Performance analysis and grid computing
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
Productivity prediction of MPI programs based on models
Automation and Remote Control
Modeling advanced collective communication algorithms on cell-based systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Cache size in a cost model for heterogeneous skeletons
Proceedings of the fifth international workshop on High-level parallel programming and applications
Compiler-Directed performance model construction for parallel programs
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
SGL: towards a bridging model for heterogeneous hierarchical platforms
International Journal of High Performance Computing and Networking
Model based performance evaluation for MPI programs
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Hi-index | 0.00 |
We present a new model of parallel computation---the LogGP model---and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation which abstracts the communication of fixed-sized short messages through the use of four parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P). As evidenced by experimental data, the LogP model can accurately predict communication performance when only short messages are sent (as on the CM-5). However, many existing parallel machines have special support for long messages and achieve a much higher bandwidth for long messages compared to short messages (e.g., IBM SP-2, Paragon, Meiko CS-2, Ncube/2). We extend the basic LogP model with a linear model for long messages. This combination, which we call the LogGP model of parallel computation, has one additional parameter, G, which captures the bandwidth obtained for long messages. Experimental data collected on the Meiko CS-2 shows that this simple extension of the LogP model can quite accurately predict communication performance for both short and long messages. This paper discusses algorithm design and analysis under the new model, examining the all-to-all remap, FFT, and radix sort. We also examine, in more detail, the single node scatter problem. We derive solutions for this problem and prove their optimality under the LogGP model. These solutions are qualitatively different from those obtained under the simpler LogP model, reflecting the importance of capturing long messages in a model.