Parallel (&Dgr;+1)-coloring of constant-degree graphs
Information Processing Letters
The de Bruijn Multiprocessor Network: A Versatile Parallel Processing and Sorting Network for VLSI
IEEE Transactions on Computers
Data communication in hypercubes
Journal of Parallel and Distributed Computing
Scans as Primitive Parallel Operations
IEEE Transactions on Computers
IEEE Transactions on Computers
Group action graphs and parallel architectures
SIAM Journal on Computing
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Dynamic tree embeddings in butterflies and hypercubes
SIAM Journal on Computing
Nonuniform region processing on SIMD arrays using the coterie network
Machine Vision and Applications - Next generation architectures
Parallel Computations on Reconfigurable Meshes
IEEE Transactions on Computers
Constant-slowdown simulations of normal hypercube algorithms on the butterfly network
Information Processing Letters
An empirical methodology for exploring reconfigurable architectures
Journal of Parallel and Distributed Computing
Cell graphs for managing communication in parallel computing
Cell graphs for managing communication in parallel computing
Optimal emulations by butterfly-like networks
Journal of the ACM (JACM)
Work-preserving emulations of fixed-connection networks
Journal of the ACM (JACM)
ACM Transactions on Programming Languages and Systems (TOPLAS)
ASP: A Cost-Effective Parallel Microcomputer
IEEE Micro
Hi-index | 0.00 |
We illustrate the potential of techniques and results from the theory of network emulations to enhance the performance of a parallel architecture. The vehicle for this demonstration is a suite of algorithms that endow an $N$-processor bit-serial processor array ${\cal A}$ with a 驴meta-instruction驴GAUGE$k$, which (logically) reconfigures ${\cal A}$ into an $N/k$-processor virtual machine ${\cal B}_k$ that has: 1) a datapath and memory bus whose emulated width is $k$ bits, as opposed to ${\cal A}$'s 1-bit width and 2) an instruction set that operates on $k$-bit words, in contrast to ${\cal A}$'s instruction set, which operates on 1-bit words. In order to stress the strength of the approach, we show (via pseudocode) how our emulation techniques can be implemented efficiently even if ${\cal A}$ operates in strict SIMD mode, with only single-bit masking capabilities and with no indexed memory accesses. We describe at an algorithmic level how to implement our technique驴including datapath conversion (驴corner-turning驴) and the creation of the word-parallel instruction sets驴on arrays of any regular network topology. We instantiate our technique in detail for arrays based on topologies with quite disparate characteristics: the hypercube, the de Bruijn network, and a genre of mesh with reconfigurable buses. Importantly, the emulations that underlie our technique do not alter the native machine's instruction set, hence allowing an invariant programming model across gauges.