VLSI array processors
A multiprocessor architecture for two-dimensional digital filters
IEEE Transactions on Computers
Scheduling precedence graphs in systems with interprocessor communication times
SIAM Journal on Computing
K9: a simulator of distributed-memory parallel processors
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Digital image processing
Static Rate-Optimal Scheduling of Iterative Data-Flow Programs Via Optimum Unfolding
IEEE Transactions on Computers
Compilation of functional languages using flow graph analysis
Software—Practice & Experience
An assessment of assignment schemes for dependency graphs
Parallel Computing
Developing a simulator for the USC orthogonal multiprocessor
WSC' 90 Proceedings of the 22nd conference on Winter simulation
Digital Control Systems
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
Multidimensional Digital Signal Processing
Multidimensional Digital Signal Processing
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
Solving Linear Systems on Vector and Shared Memory Computers
Solving Linear Systems on Vector and Shared Memory Computers
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
A Comparison of Heuristics for Scheduling DAGs on Multiprocessors
Proceedings of the 8th International Symposium on Parallel Processing
A Programmable Simulator for Analyzing the Block Data Flow Architecture
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Scheduling Loops on Parallel Processors: A Simple Algorithm with Close to Optimum Performance
CONPAR '92/ VAPP V Proceedings of the Second Joint International Conference on Vector and Parallel Processing: Parallel Processing
Automatic synthesis of systolic arrays from uniform recurrent equations
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Approximate and exact parallel scheduling with applications to list, tree and graph problems
SFCS '86 Proceedings of the 27th Annual Symposium on Foundations of Computer Science
EVAM: an eigenvector-based algorithm for multichannel blinddeconvolution of input colored signals
IEEE Transactions on Signal Processing
A note on “task allocation and scheduling models formultiprocessor digital signal processing”
IEEE Transactions on Signal Processing
A Toeplitz-induced mapping technique in sensor array processing
IEEE Transactions on Signal Processing
Conjugate gradient eigenstructure tracking for adaptive spectralestimation
IEEE Transactions on Signal Processing
An algorithm for pole-zero system model order estimation
IEEE Transactions on Signal Processing
New systolic array implementation of the 2-D discrete cosine transform and its inverse
IEEE Transactions on Circuits and Systems for Video Technology
A 100 MHz 2-D 8×8 DCT/IDCT processor for HDTV applications
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Many digital signal and image processing algorithms can be speeded up by executing them in parallel on multiple processors. The speed of parallel execution is limited by the need for communication and synchronization between processors. In this paper, we present a paradigm for parallel processing that we call the block data flow paradigm (BDFP). The goal of this paradigm is to reduce interprocessor communication, and relax the synchronization requirements for such applications. We present the block data parallel architecture which implements this paradigm, and we present methods for mapping algorithms onto this architecture. We illustrate this methodology for several applications including two-dimensional (2-D) digital filters, the 2-D discrete cosine transform, QR decomposition of a matrix, and Cholesky factorization of a matrix. We analyze the resulting system performance for these applications with regard to speedup and efficiency as the number of processors increases. Our results demonstrate that the block data parallel architecture is a flexible, high-performance solution for numerous digital signal and image processing algorithms.