Accelerating the dynamic programming for the optimal polygon triangulation on the GPU
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
An optimal parallel prefix-sums algorithm on the memory machine models for GPUs
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Hi-index | 0.00 |
Modern GPUs (Graphics Processing Units) can be used for general purpose parallel computation. Users can develop parallel programs running on GPUs using programming architecture called CUDA (Compute Unified Device Architecture). The Matrix Chain Product Problem is an optimization problem for finding parentheses of the matrix chain that gives the minimum total number of multiplications necessary to compute the product of the matrix chain. It is well known that this problem can be solved using the dynamic programming technique in $O(n^3)$ time using tables of size $O(n^2)$. The main contribution of this paper is to present an efficient parallel implementation of this $O(n^3)$-time algorithm on the GPU. In our implementation, we have considered the architecture and programming issues of the GPU system. The experimental results show that, for a chain of 16384 matrices generated at random, our implementation in the Nvidia GeForce GTX 480 achieves a speedup factor of 40 over a conventional CPU implementation.