An O(n2 log n) parallel max-flow algorithm
Journal of Algorithms
An introduction to parallel algorithms
An introduction to parallel algorithms
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
ICS '90 Proceedings of the 4th international conference on Supercomputing
From algorithm parallelism to instruction-level parallelism: an encode-decode chain using prefix-sum
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Practical Pram Programming
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
Fpga-based prototype of a pram-on-chip processor
Proceedings of the 5th conference on Computing frontiers
Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing
Proceedings of the 45th annual Design Automation Conference
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Using simple abstraction to reinvent computing for parallelism
Communications of the ACM
Hi-index | 0.02 |
Our earlier parallel algorithmics work on the parallel random-access-machine/model (PRAM) computation model led us to a PRAM-On-Chip vision: a comprehensive many-core system that can look to the programmer like the abstract PRAM model. We introduced the eXplicit Multi-Threaded (XMT) design and prototyped it in hardware and software. XMT comprises a programmer's workflow that advances from work-depth, a standard PRAM theory abstraction, to an XMT program, and, if desired, to its performance tuning. XMT provides strong performance for programs developed this way due to its hardware support of very fine-grained threads and the overhead of handling them. XMT has also shown unique promise when it comes to ease-of-programming, the biggest problem that has limited the impact of all parallel systems to date. For example, teachability of XMT programming has been demonstrated at various levels from rising 6th graders to graduate students, and students in a freshman class were able to program 3 parallel sorting algorithms. The main purpose of the current paper is to stimulate discussion on the following somewhat open-ended question. Now that we made significant progress on a system devoted to supporting PRAM-like programming, is it possible to incorporate our hardware support as an add-on into other current and future many-core systems? The paper considers a concrete proposal for doing that: recasting our work as a hardware-enhanced programmer's workflow "module" that can then be essentially imported into the other systems.