Communication complexity of PRAMs
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
Introduction to algorithms
Space-efficient scheduling of multithreaded computations
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Can parallel algorithms enhance serial implementation?
Communications of the ACM
Space-efficient scheduling of parallelism with synchronization variables
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
ICS '01 Proceedings of the 15th international conference on Supercomputing
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Hi-index | 0.00 |
The utility of algorithm parallelism for coping with increased processor to memory latencies using "latency hiding" is part of the folklore of parallel computing. Latency hiding techniques increase the traffic to memory and therefore may "hit another wall": limited bandwidth to memory. The current paper attempts to stimulate research in the following general direction: show that algorithm parallelism need not conflict with limited bandwidth.A general technique for using parallel algorithms to enhance serial implementation in the face of processor-memory latency problems is revisited. Two techniques for alleviating memory bandwidth constraints are presented. Both techniques can be incorporated in a compiler.There is often considerable parallelism in many of the algorithms which are known as useful serial algorithms. Interestingly enough, all the examples provided for the use of the two techniques come from such serial algorithms.