Compiling collection-oriented languages onto massively parallel computers
Journal of Parallel and Distributed Computing - Massively parallel computation
Vector models for data-parallel computing
Vector models for data-parallel computing
Let-floating: moving bindings to give faster programs
Proceedings of the first ACM SIGPLAN international conference on Functional programming
A provable time and space efficient implementation of NESL
Proceedings of the first ACM SIGPLAN international conference on Functional programming
Piecewise Execution of Nested Data-Parallel Programs
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Work-efficient nested data-parallelism
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Data parallel Haskell: a status report
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Stream fusion: from lists to streams to nothing at all
ICFP '07 Proceedings of the 12th ACM SIGPLAN international conference on Functional programming
A scheduling framework for general-purpose parallel languages
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Space profiling for parallel functional programs
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Parallel programming in Haskell almost for free: an embedding of intel's array building blocks
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Proceedings of the 2012 Haskell Symposium
Towards a streaming model for nested data parallelism
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Hi-index | 0.00 |
Existing approaches to higher-order vectorisation, also known as flattening nested data parallelism, do not preserve the asymptotic work complexity of the source program. Straightforward examples, such as sparse matrix-vector multiplication, can suffer a severe blow-up in both time and space, which limits the practicality of this method. We discuss why this problem arises, identify the mis-handling of index space transforms as the root cause, and present a solution using a refined representation of nested arrays. We have implemented this solution in Data Parallel Haskell (DPH) and present benchmarks showing that realistic programs, which used to suffer the blow-up, now have the correct asymptotic work complexity. In some cases, the asymptotic complexity of the vectorised program is even better than the original.