Parallelizing complex scans and reductions
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Powerlist: a structure for parallel recursion
ACM Transactions on Programming Languages and Systems (TOPLAS)
Solving linear recurrences with loop raking
Journal of Parallel and Distributed Computing
Massive parallelization of divide-and-conquer algorithms over powerlists
Science of Computer Programming - Special issue on mathematics of program construction
Formal derivation of efficient parallel programs by construction of list homomorphisms
ACM Transactions on Programming Languages and Systems (TOPLAS)
Systematic Efficient Parallelization of Scan and Other List Homomorphisms
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Deriving Parallel Codes via Invariants
SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Parallelization via Context Preservatio
ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
NESL: A Nested Data-Parallel Language
NESL: A Nested Data-Parallel Language
Efficient parallel solutions of linear algebraic circuits
Journal of Parallel and Distributed Computing
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
GPUTeraSort: high performance graphics co-processor sorting for large database management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Simulation of cloud dynamics on graphics hardware
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Automatic inversion generates divide-and-conquer parallel programs
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Parallel solution of recurrence problems
IBM Journal of Research and Development
GPU-ABiSort: optimal parallel sorting on stream architectures
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Initial experiences porting a bioinformatics application to a graphics processor
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Automatic parallelization via matrix multiplication
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Hi-index | 0.00 |
This paper presents a novel method for optimizing the parallel computation of linear recurrences. Our method can help reduce the resource requirements for both memory and computation. A unique feature of our technique is its formulation of linear recurrences as matrix computations, before exploiting their mathematical properties for more compact representations. Based on a general notion of closure for matrix multiplication, we present two classes of matrices that have compact representations. These classes are permutation matrices and matrices whose elements are linearly related to each other. To validate the proposed method, we experiment with solving recurrences whose matrices have compact representations using CUDA on nVidia GeForce 8800 GTX GPU. The advantages of our technique are that it enables the computation of larger recurrences in parallel and it provides good speedups of up to eleven times over the un-optimized parallel computations. Also, the memory usage can be as much as nine times lower than that of the un-optimized parallel computations. Our result confirms a promising approach for the adoption of more advanced parallelization techniques.