Thread-based programming for the EM-4 hybrid dataflow machine
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
TAM—a compiler controlled threaded abstract machine
Journal of Parallel and Distributed Computing - Special issue on dataflow and multithreaded architectures
IBM Systems Journal
Multithreaded processor architectures
IEEE Spectrum
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The EM-X parallel computer: architecture and basic performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Reducing communication by honoring multiple alignments
ICS '95 Proceedings of the 9th international conference on Supercomputing
Multithreading with the EM-4 distributed-memory multiprocessor
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A design study of the EARTH multiprocessor
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
ICS '90 Proceedings of the 4th international conference on Supercomputing
Advanced Topics in Dataflow Computing and Multithreading
Advanced Topics in Dataflow Computing and Multithreading
START-NG: Delivering Seamless Parallel Computing
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Data locality sensitivity of multithreaded computations on a distributed-memory multiprocessor
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Load Balancing HPF programs by Migrating Virtual Processors
HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Cache Performance and Algorithm Optimization
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Hi-index | 0.00 |
While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various practical issues. This report presents our study on multithreading for distributed-memory multiprocessors. Specifically, we investigate the effects of multithreading on data distribution and workload distribution with variable thread granularity. Various types of workload distribution strategies are defined along thread granularity. Three types of data distribution strategies are investigated, including row-wise cyclic, k-way partial-row cyclic, and blocked distribution. We have implemented all of these on the 80-processor EM-4 distributed-memory multiprocessor using highly sequential Gaussian Elimination with Partial Pivoting and highly parallel Matrix Multiplication. Experimental results indicated that multithreading can offset the loss that is due to the mismatch of data distribution to workload distribution for even sequential and irregular problems while giving high absolute performance.