Towards an architecture-independent analysis of parallel algorithms
SIAM Journal on Computing
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The impact of communication locality on large-scale multiprocessor performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Fast Parallel Sorting Under LogP: Experience with the CM-5
IEEE Transactions on Parallel and Distributed Systems
LoPC: modeling contention in parallel algorithms
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Assessing Fast Network Interfaces
IEEE Micro
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
On the time complexity of broadcast communication schemes (Preliminary Version)
STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
Exploiting Two-Case Delivery for Fast Protected Messaging
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Sensitivity of Communication Mechanisms to Bandwidth and Latency
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
SimpleFit: A Framework for Analyzing Design Trade-Offs in Raw Architectures
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Incorporating memory layout in the modeling of message passing programs
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Quantification of memory communication
High performance scientific and engineering computing
Opportunities and challenges in application-tuned circuits and architectures based on nanodevices
Proceedings of the 1st conference on Computing frontiers
Journal of Parallel and Distributed Computing
The impact of spatial layout of jobs on I/O hotspots in mesh networks
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
Optimizing communication overlap for high-speed networks
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Journal of Parallel and Distributed Computing
Adaptive approaches for efficient parallel algorithms on cluster-based systems
International Journal of Grid and Utility Computing
Modeling advanced collective communication algorithms on cell-based systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Assessing contention effects on MPI_alltoall communications
GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Manycore performance analysis using timed configuration graphs
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
LogGOPSim: simulating large-scale applications in the LogGOPS model
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Toward performance models of MPI implementations for understanding application scaling issues
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Incorporating memory layout in the modeling of message passing programs
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Fast barrier synchronization for InfiniBand™
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
LogfP - a model for small messages in InfiniBand
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A contention-aware performance model for HPC-based networks: a case study of the InfiniBand network
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Total exchange performance modelling under network contention
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Hi-index | 0.00 |
In many real applications, for example, those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, that extends the LogP [9] and LogGP [4] models to account for the impact of network contention and network interface DMA behavior on the performance of message passing programs. We validate LoGPC by analyzing three applications implemented with Active Messages [11], [19] on the MIT Alewife multiprocessor. Our analysis shows that network contention accounts for up to 50 percent of the total execution time. In addition, we show that the impact of communication locality on the communication costs is at most a factor of two on Alewife. Finally, we use the model to identify trade-offs between synchronous and asynchronous message passing styles.