Fpga-based prototype of a pram-on-chip processor

Authors:
Xingzhi Wen;Uzi Vishkin
Affiliations:
University of Maryland, College Park, MD, USA;University of Maryland, College Park, MD, USA
Venue:
Proceedings of the 5th conference on Computing frontiers
Year:
2008

Citing 23
Cited 18

An O(n2 log n) parallel max-flow algorithm

Journal of Algorithms
Parallel Quicksort Using Fetch-And-Add

IEEE Transactions on Computers
An introduction to parallel algorithms

An introduction to parallel algorithms
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Practical Pram Programming

Practical Pram Programming
A Single-Chip Multiprocessor

Computer
HPP: A High Performance PRAM

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Building the 4 Processor SB-PRAM Prototype

HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Advanced Technology Track - Volume 5
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A Mesh-of-Trees Interconnection Network for Single-Chip Parallel Processing

ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
PRAM-on-chip: first commitment to silicon

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Layout-Accurate Design and Implementation of a High-Throughput Interconnection Network for Single-Chip Parallel Processing

HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer

IEEE Transactions on Computers
Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing

Proceedings of the 45th annual Design Automation Conference
A pilot study to compare programming effort for two parallel programming models

Journal of Systems and Software
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro
Sun's big splash [Niagara microprocessor chip]

IEEE Spectrum

An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing

Proceedings of the 45th annual Design Automation Conference
What the parallel-processing community has (failed) to offer the multi/many-core generation

Journal of Parallel and Distributed Computing
Brief announcement: performance potential of an easy-to-program PRAM-on-chip prototype versus state-of-the-art processor

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Lazy binary-splitting: a run-time adaptive work-stealing scheduler

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Mesh-of-trees and alternative interconnection networks for single-chip parallelism

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Outline of RISC-based core for multiprocessor on chip architecture supporting moving threads

CompSysTech '09 Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
MVTsim: software simulator for multicore on chip parallel computer architectures

CompSysTech '09 Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
Algorithmic approach to designing an easy-to-program system: Can it lead to a HW-enhanced programmer's workflow add-on?

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Using simple abstraction to reinvent computing for parallelism

Communications of the ACM
Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Brief announcement: better speedups for parallel max-flow

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
RISC-based moving threads multicore architecture

Proceedings of the 12th International Conference on Computer Systems and Technologies
Better speedups using simpler parallel programming for graph connectivity and biconnectivity

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Thermal management of a many-core processor under fine-grained parallelism

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Brief announcement: speedups for parallel graph triconnectivity

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Preliminary analysis of feasible benchmark problems for the hydrid PRAM/NUMA REPLICA architecture

Proceedings of the 13th International Conference on Computer Systems and Technologies
Scheduling directives for shared-memory many-core processor systems

Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Scheduling directives: Accelerating shared-memory many-core processor execution

Parallel Computing

Quantified Score

Hi-index	0.02

Visualization

Abstract

PRAM (Parallel Random Access Model) has been widely regarded a desirable parallel machine model for many years, but it is also believed to be "impossible in reality." As the new billion-transistor processor era begins, the eXplicit Multi-Threading (XMT) PRAM-On-Chip project is attempting to design an on-chip parallel processor that efficiently supports PRAM algorithms. This paper presents the first prototype of the XMT architecture that incorporates 64 simple in-order processors operating at 75MHz. The microarchitecture of the prototype is described and the performance is studied with respect to some micro-benchmarks. Using cycle accurate emulation, the projected performance of an 800MHz XMT ASIC processor is compared with AMD Opteron 2.6GHz, which uses similar area as would a 64-processor ASIC version of the XMT prototype. The results suggest that an only 800MHz XMT ASIC system outperforms AMD Opteron 2.6GHz, with speedups ranging between 1.57 and 8.56.