Engineering parallel in-place random generation of integer permutations

Authors:
Jens Gustedt
Affiliations:
INRIA Nancy, Grand Est, France
Venue:
WEA'08 Proceedings of the 7th international conference on Experimental algorithms
Year:
2008

Citing 9
Cited 0

Parallel algorithms for generating random permutations on a shared memory machine

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Random permutations on distributed, external and hierarchical memory

Information Processing Letters
Randomized fully-scalable BSP techniques for multi-searching and convex hull construction

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Algorithm 235: Random permutation

Communications of the ACM
Randomized permutations in a coarse grained parallel environment

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Efficient sampling of random permutations

Journal of Discrete Algorithms
Estimating the size of the transitive closure in linear time

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Multilevel circuit partitioning

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

We tackle the feasibility and efficiency of two new parallel algorithms that sample random permutations of the integers [M] = {1, ..., M} . The first reduces the communication for p processors from O(M) words (O(M logM) bits, the coding size of the permutation) to O(M log p/ logM) words (O(M log p) bits, the coding size of a partition of [M] into M/p sized subsets). The second exploits the common case of using pseudo-random numbers instead of real randomness. It reduces the communication even further to a use of bandwidth that is proportional to the used real randomness. Careful engineering of the required subroutines is necessary to obtain a competitive implementation. Especially the second approach shows very good results which are demonstrated by large scale experiments. It shows high scalability and outperforms the previously known approaches by far. First, we compare our algorithm to the classical sequential data shuffle algorithm, where we get a speedup of about 1.5. Then, we show how the algorithm parallelizes well on a multicore system and scales to a cluster of 440 cores.