A transfer network for the Arbitrary Rotation of Digitised Images
The Computer Journal
Data dependence and its application to parallel processing
International Journal of Parallel Programming
Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
A practical algorithm for exact array dependence analysis
Communications of the ACM
Definitions of dependence distance
ACM Letters on Programming Languages and Systems (LOPLAS)
The Hough transform on a reconfigurable multi-ring network
Journal of Parallel and Distributed Computing
Static and Dynamic Evaluation of Data Dependence Analysis Techniques
IEEE Transactions on Parallel and Distributed Systems
Nonlinear and Symbolic Data Dependence Testing
IEEE Transactions on Parallel and Distributed Systems
Data dependence analysis for array references
Journal of Systems and Software
Compilation techniques for multimedia processors
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
A vectorizing compiler for multimedia extensions
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Dependence Analysis
Program analysis techniques for transforming programs for parallel execution
Parallel Computing
Automatic intra-register vectorization for the Intel architecture
International Journal of Parallel Programming
Automatic Detection of Parallelism: A Grand Challenge for High-Performance Computing
IEEE Parallel & Distributed Technology: Systems & Technology
The Power Test for Data Dependence
IEEE Transactions on Parallel and Distributed Systems
Parallel Edge-Region-Based Segmentation Algorithm Targeted at Reconfigurable MultiRing Network
The Journal of Supercomputing
Optimizing neural networks on SIMD parallel computers
Parallel Computing
Optimizing mobile multimedia using SIMD techniques
Multimedia Tools and Applications
An Efficient Matrix-Based 2-D DCT Splitter and Merger for SIMD Instructions*
IEICE - Transactions on Information and Systems
Improving the parallelism of iterative methods by aggressive loop fusion
The Journal of Supercomputing
Journal of Signal Processing Systems
A general data dependence analysis for parallelizing compilers
The Journal of Supercomputing
A general data dependence analysis for parallelizing compilers
The Journal of Supercomputing
Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor
The Journal of Supercomputing
Toward the parallelization of GSL
The Journal of Supercomputing
A directive-based MPI code generator for Linux PC clusters
The Journal of Supercomputing
Parallel loop generation and scheduling
The Journal of Supercomputing
Region-based parallelization of irregular reductions on explicitly managed memory hierarchies
The Journal of Supercomputing
Implementing the 2-D Wavelet Transform on SIMD-Enhanced General-Purpose Processors
IEEE Transactions on Multimedia
An experimental evaluation of data dependence analysis techniques
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
In this paper we present an approximate algorithm for detecting and filtering data dependencies with a sufficiently large distance between memory references. A sequence of the same operations (typically enclosed in a `for' loop) can be replaced with a single SIMD operation if the distance between memory references is greater than or equal to the number of data processed in the SIMD register. Some loops that could not be vectorized on traditional vector processors, can still be parallelized for short SIMD execution. There are a number of approximate data-dependence tests that have been proposed in the literature but in all of them data dependency will be assumed when actually there is no such a dependence that could restrict parallelization related to the short SIMD execution model. By examining the properties of linear subscript expressions of possibly conflicting data references, our algorithm gives the green light to the parallelization process if some sufficient conditions regarding the dependence distance are met. Our method is based on the Banerjee test and checks the minimum and maximum distances between memory references within the iteration space rather than searching for the existence of an integer solution to the dependence equation. The proposed method extends the accuracy and applicability of the classical Banerjee test.