IBM RISC System/6000 processor architecture
IBM Journal of Research and Development
Performance effects of irregular communication patterns on massively parallel multiprocessors
Journal of Parallel and Distributed Computing
i860 microprocessor family programmer's reference manual
i860 microprocessor family programmer's reference manual
Scalable parallel computing: the IBM 9076 scalable POWERparallel 1
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Generating communication for array statements: design, implementation, and evaluation
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Optimizing memory system performance for communication in parallel computers
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Architecture of the Pentium Microprocessor
IEEE Micro
The Power PC 601 Microprocessor
IEEE Micro
IEEE Transactions on Parallel and Distributed Systems
A Host Interface Architecture for High-Speed Networks
Proceedings of the IFIP TC6/WG6.4 Fourth International Conference on High Performance Networking IV
Efficient Compilation of Array Statements for Private Memory Multicomputers
Efficient Compilation of Array Statements for Private Memory Multicomputers
Disk-directed I/O for MIMD multiprocessors
ACM Transactions on Computer Systems (TOCS)
Hi-index | 0.00 |
Message assembly and disassembly represent a significant fraction of total communication time in many parallel systems. We introduce a run-time approach for fast message assembly and disassembly. The approach is based on generating addresses by decoding a precomputed and compactly stored address relation that describes the mapping of addresses on the source node to addresses on the destination node. The main result is that relations induced by redistributions of regular block-cyclic distributed arrays can be encoded in an extremely compact form that facilitates high throughput message assembly and disassembly. We measure the throughput of decoding-based message assembly and disassembly on several systems and find performance on par with copy throughput.