Using Emulations to Enhance the Performance of Parallel Architectures

Authors:
Bojana Obernić;Martin C. Herbordt;Arnold L. Rosenberg;Charles C. Weems
Affiliations:
Queen's College, Flushing, NY;Univ. of Houston, Houston, TX;Univ. of Massachusetts, Amherst;Univ. of Massachusetts, Amherst
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1999

Citing 17
Cited 0

Parallel (&Dgr;+1)-coloring of constant-degree graphs

Information Processing Letters
The de Bruijn Multiprocessor Network: A Versatile Parallel Processing and Sorting Network for VLSI

IEEE Transactions on Computers
Data communication in hypercubes

Journal of Parallel and Distributed Computing
Scans as Primitive Parallel Operations

IEEE Transactions on Computers
Polymorphic-Torus Network

IEEE Transactions on Computers
Group action graphs and parallel architectures

SIAM Journal on Computing
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
Dynamic tree embeddings in butterflies and hypercubes

SIAM Journal on Computing
Nonuniform region processing on SIMD arrays using the coterie network

Machine Vision and Applications - Next generation architectures
Parallel Computations on Reconfigurable Meshes

IEEE Transactions on Computers
Constant-slowdown simulations of normal hypercube algorithms on the butterfly network

Information Processing Letters
An empirical methodology for exploring reconfigurable architectures

Journal of Parallel and Distributed Computing
Cell graphs for managing communication in parallel computing

Cell graphs for managing communication in parallel computing
Optimal emulations by butterfly-like networks

Journal of the ACM (JACM)
Work-preserving emulations of fixed-connection networks

Journal of the ACM (JACM)
Ultracomputers

ACM Transactions on Programming Languages and Systems (TOPLAS)
ASP: A Cost-Effective Parallel Microcomputer

IEEE Micro

Quantified Score

Hi-index	0.00

Visualization

Abstract

We illustrate the potential of techniques and results from the theory of network emulations to enhance the performance of a parallel architecture. The vehicle for this demonstration is a suite of algorithms that endow an $N$-processor bit-serial processor array ${\cal A}$ with a 驴meta-instruction驴GAUGE$k$, which (logically) reconfigures ${\cal A}$ into an $N/k$-processor virtual machine ${\cal B}_k$ that has: 1) a datapath and memory bus whose emulated width is $k$ bits, as opposed to ${\cal A}$'s 1-bit width and 2) an instruction set that operates on $k$-bit words, in contrast to ${\cal A}$'s instruction set, which operates on 1-bit words. In order to stress the strength of the approach, we show (via pseudocode) how our emulation techniques can be implemented efficiently even if ${\cal A}$ operates in strict SIMD mode, with only single-bit masking capabilities and with no indexed memory accesses. We describe at an algorithmic level how to implement our technique驴including datapath conversion (驴corner-turning驴) and the creation of the word-parallel instruction sets驴on arrays of any regular network topology. We instantiate our technique in detail for arrays based on topologies with quite disparate characteristics: the hypercube, the de Bruijn network, and a genre of mesh with reconfigurable buses. Importantly, the emulations that underlie our technique do not alter the native machine's instruction set, hence allowing an invariant programming model across gauges.