Automatic parallelization via matrix multiplication

Authors:
Shigeyuki Sato;Hideya Iwasaki
Affiliations:
The University of Electro-Communications, Tokyo, Japan;The University of Electro-Communications, Tokyo, Japan
Venue:
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Year:
2011

Citing 16
Cited 3

An introduction to the theory of lists

Proceedings of the NATO Advanced Study Institute on Logic of programming and calculi of discrete design
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Parallelizing complex scans and reductions

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations

Journal of the ACM (JACM)
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Synthesis of Parallel Algorithms

Synthesis of Parallel Algorithms
Automatic intra-register vectorization for the Intel architecture

International Journal of Parallel Programming
Parallelization via Context Preservatio

ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
Towards automatic parallelization of tree reductions in dynamic programming

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Compilers: Principles, Techniques, and Tools (2nd Edition)

Compilers: Principles, Techniques, and Tools (2nd Edition)
Automatic inversion generates divide-and-conquer parallel programs

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Optimizing the parallel computation of linear recurrences using compact matrix representations

Journal of Parallel and Distributed Computing
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations

IEEE Transactions on Computers
Implementing fusion-equipped parallel skeletons by expression templates

IFL'09 Proceedings of the 21st international conference on Implementation and application of functional languages
Automatic parallelization of recursive functions using quantifier elimination

FLOPS'10 Proceedings of the 10th international conference on Functional and Logic Programming
Domain-specific optimization strategy for skeleton programs

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Scan detection and parallelization in "inherently sequential" nested loop programs

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Generate, test, and aggregate: a calculation-based framework for systematic parallel programming with mapreduce

ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
A generate-test-aggregate parallel programming library: systematic parallel programming for MapReduce

Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing work that deals with parallelization of complicated reductions and scans focuses only on formalism and hardly dealt with implementation. To bridge the gap between formalism and implementation, we have integrated parallelization via matrix multiplication into compiler construction. Our framework can deal with complicated loops that existing techniques in compilers cannot parallelize. Moreover, we have sophisticated our framework by developing two sets of techniques. One enhances its capability for parallelization by extracting max-operators automatically, and the other improves the performance of parallelized programs by eliminating redundancy. We have also implemented our framework and techniques as a parallelizer in a compiler. Experiments on examples that existing compilers cannot parallelize have demonstrated the scalability of programs parallelized by our implementation.