I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
The PowerPC architecture: a specification for a new family of RISC processors
The PowerPC architecture: a specification for a new family of RISC processors
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Recursive functions of symbolic expressions and their computation by machine, Part I
Communications of the ACM
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
POWER4 system microarchitecture
IBM Journal of Research and Development
Hi-index | 0.01 |
As processor speeds continue to increase at a much higher rate than memory speeds, memory latencies may soon approach a thousand processor cycles. As a result, the flat memory model that was made practical by deeply pipelined superscalar processors with multilevel caches will no longer be tenable. The most common approach to this problem is multithreading; however, multithreading requires either abundant independent applications or well-parallelized monolithic applications, and neither is easy to come by. We present high-level programming constructs called braids and fibers. The programming constructs facilitate the creation of programs that are partially ordered, in which the partial orders can be used to support adaptive responses to memory access latencies. Braiding is simpler than parallelizing, while yielding many of the same benefits. We show how the programming constructs can be effectively supported with simple instruction set architecture extensions and microarchitectural enhancements. We have developed braided versions of a number of important algorithms. The braided code is easy to understand at the source level and can be translated into highly efficient instructions using our architecture extensions.