Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Pipelined functional tree accesses and updates: scheduling, synchronization, caching and coherence
Journal of Functional Programming
Hi-index | 0.00 |
In this paper we quantify the effect of this trend in multiprocessor architecture on parallel program performance. Our experiments on bus-based, cache-coherent machines like the Sequent Symmetry, and large-scale distributed-memory machines like the BBN Butterfly, demonstrate that applications scale much better on previous-generation machines than on current machines. In addition, we show that some scalable machines support fine-grain, shared-memory programs better than some bus-based, cache-coherent machines, without significantly greater programming effort. From our experiments we conclude that communication has become a dominant source of inefficiency in shared-memory multiprocessors, with serious consequences for system software involved in scheduling and decomposition decisions. In particular, we argue that shared-memory programming models that could be implemented efficiently on the machines of yesterday do not readily port to state-of-the-art machines, and that current software trends in support of fine-grain parallel programming are at odds with hardware trends.