EMP: zero-copy OS-bypass NIC-driven gigabit ethernet message passing
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Support for MPI at the Network Interface Level
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Message passing and shared address space parallelism on an SMP cluster
Parallel Computing
The Impact of MPI Queue Usage on Message Latency
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
A Hardware Acceleration Unit for MPI Queue Processing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
An Evaluation of Two Implementation Strategies for Optimizing One-Sided Atomic Reduction
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
A Preliminary Analysis of the MPI Queue Characteristics of Several Applications
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Productivity and performance using partitioned global address space languages
Proceedings of the 2007 international workshop on Parallel symbolic computation
Implications of application usage characteristics for collective communication offload
International Journal of High Performance Computing and Networking
Natively Supporting True One-Sided Communication in MPI on Multi-core Systems with InfiniBand
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
TupleQ: Fully-asynchronous and zero-copy MPI over InfiniBand
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A speculative and adaptive MPI rendezvous protocol over RDMA-enabled interconnects
International Journal of Parallel Programming
Network Interface Architecture for Scalable Message Queue Processing
ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
The Importance of Non-Data-Communication Overheads in MPI
International Journal of High Performance Computing Applications
Characteristics of the unexpected message queue of MPI applications
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Writing parallel libraries with MPI - common practice, issues, and extensions
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Parallelizing BLAST and SOM Algorithms with MapReduce-MPI Library
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Investigating Scenario-Conscious Asynchronous Rendezvous over RDMA
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Performance Evaluation of Open MPI on Cray XE/XK Systems
HOTI '12 Proceedings of the 2012 IEEE 20th Annual Symposium on High-Performance Interconnects
An evaluation of open MPI's matching transport layer on the Cray XT
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
An Efficient MPI Message Queue Mechanism for Large-scale Jobs
ICPADS '12 Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems
Hi-index | 0.00 |
The Message Passing Interface (MPI) message queues have been shown to grow proportionately to the job size for many applications. With such a behaviour and knowing that message queues are used very frequently, ensuring fast queue operations at large scales is of paramount importance in the current and the upcoming exascale computing eras. Scalability, however, is two-fold. With the growing processor core density per node, and the expected smaller memory density per core at larger scales, a queue mechanism that is blind on memory requirements poses another scalability issue even if it solves the speed of operation problem. In this work we propose a multidimensional queue management mechanism whose operation time and memory overhead grow sub-linearly with the job size. We show why a novel approach is justified in spite of the existence of well-known and fast data structures such as binary search trees. We compare our proposal with a linked list-based approach which is not scalable in terms of speed of operation, and with an array-based method which is not scalable in terms of memory consumption. Our proposed multidimensional approach yields queue operation time speedups that translate to up to 4-fold execution time improvement over the linked list design for the applications studied in this work. It also shows a consistent lower memory footprint compared to the array-based design. Finally, compared to the linked list-based queue, our proposed design yields cache miss rate improvements which are on average on par with the array-based design.