Performance analysis of user-level PIM communication in the data intensive architecture (DIVA) system

Authors:
Sumit Dharampal Mediratta;Jeffrey Draper
Affiliations:
USC Information Sciences Institute, Marina del Rey;USC Information Sciences Institute, Marina del Rey
Venue:
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Year:
2005

Citing 15
Cited 0

Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An architecture for optimal all-to-all personalized communication

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach

IEEE Transactions on Parallel and Distributed Systems
Designing Tree-Based Barrier Synchronization on 2D Mesh Networks

IEEE Transactions on Parallel and Distributed Systems
Barrier Synchronization on Wormhole-Routed Networks

IEEE Transactions on Parallel and Distributed Systems
The architecture of the DIVA processing-in-memory chip

ICS '02 Proceedings of the 16th international conference on Supercomputing
A Reliable Hardware Barrier Synchronization Scheme

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Implementation of a 32-bit RISC Processor for the Data-Intensive Architecture Processing-In-Memory Chip

ASAP '02 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Complete Exchange Algorithms in Wormhole-Routed Torus Networks: A Divide-and-Conquer Strategy

ISPAN '99 Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks
MPI Performance Evaluation and Characterization using a Compact Application Benchmark Code

MPIDC '96 Proceedings of the Second MPI Developers Conference
An Area-Efficient Router for the Data-Intensive Architecture (DIVA) System

VLSID '04 Proceedings of the 17th International Conference on VLSI Design
Cost-Performance Trade-Offs in Networks on Chip: A Simulation-Based Approach

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Microbenchmark Performance Comparison of High-Speed Cluster Interconnects

IEEE Micro
Speculative Synchronization: Programmability and Performance for Parallel Codes

IEEE Micro

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of user-level messaging in PIM (Processing-In-Memory) to PIM communication is modeled and analyzed for the DIVA (Data IntensiVe Architecture) system. Six benchmarks have been used for this purpose, two from each category, namely single message transfer, parallel transfer and collective communication, as described for the PMB (Pallas MPI Benchmarks). The benchmarks used are PingPong, PingPing, SendReceive, Exchange, Barrier synchronization and AllToAll personalized exchange. The main significance of this work lies in the evaluation of an implementation of system-wide support for memory-to-memory and memory-to-host communi-cation via a parcel buffer (used as a network interface). Another remarkable feature of this evaluation lies in presenting an optimal algorithm for Barrier synchronization and an optimal algorithm, with full channel utilization, for AllToAll personalized exchange for the bi-directional ring configuration of up to 8 DIVA PIMs in the memory system of a Hewlett-Packard’s zx6000 server. The algorithms presented can be scaled for higher number of PIM chips with a little degradation in performance over the optimal solution. Our analysis shows that the currently employed communication mechanism can be used very efficiently for collective communication operations, and it also exposes the bottlenecks in the current design for future improvements.