An O(n2 log n) parallel max-flow algorithm
Journal of Algorithms
Parallel Quicksort Using Fetch-And-Add
IEEE Transactions on Computers
An introduction to parallel algorithms
An introduction to parallel algorithms
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
ICS '90 Proceedings of the 4th international conference on Supercomputing
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Practical Pram Programming
Computer
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Building the 4 Processor SB-PRAM Prototype
HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Advanced Technology Track - Volume 5
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A Mesh-of-Trees Interconnection Network for Single-Chip Parallel Processing
ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
PRAM-on-chip: first commitment to silicon
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer
IEEE Transactions on Computers
Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing
Proceedings of the 45th annual Design Automation Conference
A pilot study to compare programming effort for two parallel programming models
Journal of Systems and Software
Sun's big splash [Niagara microprocessor chip]
IEEE Spectrum
An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing
Proceedings of the 45th annual Design Automation Conference
What the parallel-processing community has (failed) to offer the multi/many-core generation
Journal of Parallel and Distributed Computing
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Lazy binary-splitting: a run-time adaptive work-stealing scheduler
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Mesh-of-trees and alternative interconnection networks for single-chip parallelism
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Outline of RISC-based core for multiprocessor on chip architecture supporting moving threads
CompSysTech '09 Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
MVTsim: software simulator for multicore on chip parallel computer architectures
CompSysTech '09 Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Using simple abstraction to reinvent computing for parallelism
Communications of the ACM
Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Brief announcement: better speedups for parallel max-flow
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
RISC-based moving threads multicore architecture
Proceedings of the 12th International Conference on Computer Systems and Technologies
Better speedups using simpler parallel programming for graph connectivity and biconnectivity
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Thermal management of a many-core processor under fine-grained parallelism
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Brief announcement: speedups for parallel graph triconnectivity
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Preliminary analysis of feasible benchmark problems for the hydrid PRAM/NUMA REPLICA architecture
Proceedings of the 13th International Conference on Computer Systems and Technologies
Scheduling directives for shared-memory many-core processor systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Hi-index | 0.02 |
PRAM (Parallel Random Access Model) has been widely regarded a desirable parallel machine model for many years, but it is also believed to be "impossible in reality." As the new billion-transistor processor era begins, the eXplicit Multi-Threading (XMT) PRAM-On-Chip project is attempting to design an on-chip parallel processor that efficiently supports PRAM algorithms. This paper presents the first prototype of the XMT architecture that incorporates 64 simple in-order processors operating at 75MHz. The microarchitecture of the prototype is described and the performance is studied with respect to some micro-benchmarks. Using cycle accurate emulation, the projected performance of an 800MHz XMT ASIC processor is compared with AMD Opteron 2.6GHz, which uses similar area as would a 64-processor ASIC version of the XMT prototype. The results suggest that an only 800MHz XMT ASIC system outperforms AMD Opteron 2.6GHz, with speedups ranging between 1.57 and 8.56.