An introduction to the theory of lists
Proceedings of the NATO Advanced Study Institute on Logic of programming and calculi of discrete design
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Parallelizing complex scans and reductions
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations
Journal of the ACM (JACM)
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Synthesis of Parallel Algorithms
Synthesis of Parallel Algorithms
Automatic intra-register vectorization for the Intel architecture
International Journal of Parallel Programming
Parallelization via Context Preservatio
ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
Towards automatic parallelization of tree reductions in dynamic programming
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Automatic inversion generates divide-and-conquer parallel programs
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Optimizing the parallel computation of linear recurrences using compact matrix representations
Journal of Parallel and Distributed Computing
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations
IEEE Transactions on Computers
Implementing fusion-equipped parallel skeletons by expression templates
IFL'09 Proceedings of the 21st international conference on Implementation and application of functional languages
Automatic parallelization of recursive functions using quantifier elimination
FLOPS'10 Proceedings of the 10th international conference on Functional and Logic Programming
Domain-specific optimization strategy for skeleton programs
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Scan detection and parallelization in "inherently sequential" nested loop programs
Proceedings of the Tenth International Symposium on Code Generation and Optimization
ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
Hi-index | 0.00 |
Existing work that deals with parallelization of complicated reductions and scans focuses only on formalism and hardly dealt with implementation. To bridge the gap between formalism and implementation, we have integrated parallelization via matrix multiplication into compiler construction. Our framework can deal with complicated loops that existing techniques in compilers cannot parallelize. Moreover, we have sophisticated our framework by developing two sets of techniques. One enhances its capability for parallelization by extracting max-operators automatically, and the other improves the performance of parallelized programs by eliminating redundancy. We have also implemented our framework and techniques as a parallelizer in a compiler. Experiments on examples that existing compilers cannot parallelize have demonstrated the scalability of programs parallelized by our implementation.