Parallel solution of the subset-sum problem: an empirical study

Authors:
Saniyah S. Bokhari
Affiliations:
Department of Computer Science and Engineering, The Ohio State University, Columbus, OHUSA
Venue:
Concurrency and Computation: Practice & Experience
Year:
2012

Citing 10
Cited 1

Exploiting heterogeneous parallelism on a multithreaded multiprocessor

ICS '92 Proceedings of the 6th international conference on Supercomputing
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Computing Partitions with Applications to the Knapsack Problem

Journal of the ACM (JACM)
Multi-processor performance on the Tera MTA

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
An optimal parallelization of the two-list algorithm of cost O(2n/2)

Parallel Computing
Scalable Parallel Programming with CUDA

Queue - GPU Computing
A performance study of general-purpose applications on graphics processors using CUDA

Journal of Parallel and Distributed Computing
Parallel time and space upper-bounds for the subset-sum problem

Theoretical Computer Science
Exploring the performance of massively multithreaded architectures

Concurrency and Computation: Practice & Experience

A Novel CPU-GPU Cooperative Implementation of A Parallel Two-List Algorithm for the Subset-Sum Problem

Proceedings of Programming Models and Applications on Multicores and Manycores

Quantified Score

Hi-index	0.00

Visualization

Abstract

The subset-sum problem is a well-known NP-complete combinatorial problem that is solvable in pseudo-polynomial time, that is, time proportional to the number of input objects multiplied by the sum of their sizes. This product defines the size of the dynamic programming table used to solve the problem. We show how this problem can be parallelized on three contemporary architectures, that is, a 128-processor Cray Extreme Multithreading (XMT) massively multithreaded machine, a 16-processor IBM x3755 shared memory machine, and a 240-core NVIDIA FX 5800 graphics processing unit (GPU). We show that it is straightforward to parallelize this algorithm on the Cray XMT primarily because of the word-level locking that is available on this architecture. For the other two machines, we present an alternating word algorithm that can implement an efficient solution. Our results show that the GPU performs well for problems whose tables fit within the device memory. Because GPUs typically have memories in the order of 10GB, such architectures are best for small problem sizes that have tables of size approximately 1010. The IBM x3755 performs very well on medium-sized problems that fit within its 64-GB memory but has poor scalability as the number of processors increases and is unable to sustain performance as the problem size increases. This machine tends to saturate for problem sizes of 1011 bits. The Cray XMT shows very good scaling for large problems and demonstrates sustained performance as the problem size increases. However, this machine has poor scaling for small problem sizes; it performs best for problem sizes of 1012 bits or more. The results in this paper illustrate that the subset-sum problem can be parallelized well on all three architectures, albeit for different ranges of problem sizes. The performance of these three machines under varying problem sizes show the strengths and weaknesses of the three architectures. Copyright © 2012 John Wiley & Sons, Ltd.