Towards a first vertical prototyping of an extremely fine-grained parallel programming approach

Authors:
Dorit Naishlos;Joseph Nuzman;Chau-Wen Tseng;Uzi Vishkin
Affiliations:
Dept of Computer Science, University of Maryland, College Park, MD;Dept of Electrical and Computer Engineering, University of Maryland, College Park, MD and University of Maryland Institute of Advanced Computer Studies, College Park, MD;Dept of Computer Science, University of Maryland, College Park, MD and University of Maryland Institute of Advanced Computer Studies, College Park, MD;Dept of Electrical and Computer Engineering, University of Maryland, College Park, MD and University of Maryland Institute of Advanced Computer Studies, College Park, MD and Dept of Computer Scien ...
Venue:
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Year:
2001

Citing 18
Cited 10

The APRAM: incorporating asynchrony into the PRAM model

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Process coordination with fetch-and-increment

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Highly parallel computing (2nd ed.)

Highly parallel computing (2nd ed.)
Parallel Visualization Algorithms: Performance and Architectural Implications

Computer
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Active disks: programming model, algorithms and evaluation

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A no-busy-wait balanced tree parallel algorithmic paradigm

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Experiments with list ranking for explicit multi-threaded (XMT) instruction parallelism

Journal of Experimental Algorithmics (JEA)
A Single-Chip Multiprocessor

Computer
Baring It All to Software: Raw Machines

Computer
The Stanford Hydra CMP

IEEE Micro

Two techniques for reconciling algorithm parallelism with memory constraints

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Effectively sharing a cache among threads

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Circulating shared-registers for multiprocessor systems

Journal of Systems Architecture: the EUROMICRO Journal
Fpga-based prototype of a pram-on-chip processor

Proceedings of the 5th conference on Computing frontiers
Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
A pilot study to compare programming effort for two parallel programming models

Journal of Systems and Software
Mesh-of-trees and alternative interconnection networks for single-chip parallelism

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Algorithmic approach to designing an easy-to-program system: Can it lead to a HW-enhanced programmer's workflow add-on?

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors

NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Preliminary analysis of feasible benchmark problems for the hydrid PRAM/NUMA REPLICA architecture

Proceedings of the 13th International Conference on Computer Systems and Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Explicit-multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with 1) a simple programming style that relies on fine-grained PRAM-style algorithms; 2) hardware support for low-overhead parallel threads, scalable load balancing, and efficient synchronization. The missing link between the algorithmic-programming level and the architecture level is provided by the first prototype XMT compiler. This paper also takes this new opportunity to evaluate the overall effectiveness of the interaction between the programming model and the hardware, and enhance its performance where needed, incorporating new optimizations into the XMT compiler. We present a wide range of applications, which written in XMT obtain significant speedups relative to the best serial programs. We show that XMT is especially useful for more advanced applications with dynamic, irregular access pattern, where for regular computations we demonstrate performance gains that scale up to much higher levels than have been demonstrated before for on-chip systems.