Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
Cache-Friendly Implementations of Transitive Closure
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Computational Aspects of VLSI
Discrete-Time Signal Processing
Discrete-Time Signal Processing
High performance 3D convolution for protein docking on IBM blue gene
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
Systolic array designs and dependency graphs are some of the most important class of algorithms in several scientific computing areas. In this paper, we first propose an abstraction based on the fundamental principles behind designing systolic arrays. Then, based on the abstraction, we propose a methodology to map a dependency graph to a generic multicore processor. Then we present two case studies: Convolution and Transitive Closure, on two state of the art multicore architectures: Intel Xeon and Cell multicore processors, illustrating the ideas in the paper. We achieved scalable results and higher performance compared to standard compiler optimizations and other recent implementations in the case studies. We comment on the performance of the algorithms by taking into consideration the architectural features of the two multicore platforms.