Logarithmic time cost optimal parallel sorting is not yet fast in practice!

Authors:
Lasse Natvig
Affiliations:
Division of Computer Systems and Telematics, The Norwegian Institute of Technology, The University of Trondheim, N-7034 Trondheim, Norway
Venue:
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Year:
1990

Citing 16
Cited 2

A taxonomy of parallel sorting

ACM Computing Surveys (CSUR)
An introduction to programming in SIMULA

An introduction to programming in SIMULA
Designing efficient algorithms for parallel computers

Designing efficient algorithms for parallel computers
DEMOS: a system for discrete event modelling on Simula

DEMOS: a system for discrete event modelling on Simula
Efficient parallel algorithms

Efficient parallel algorithms
Programming pearls

Programming pearls
Sorting in c log n parallel steps

Combinatorica
Parallel merge sort

SIAM Journal on Computing
The design and analysis of parallel algorithms

The design and analysis of parallel algorithms
Parallel Sorting Algorithms

Parallel Sorting Algorithms
Data Structures and Algorithms

Data Structures and Algorithms
Parallelism in random access machines

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Tight bounds on the complexity of parallel sorting

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Improved Sorting Networks with O(log n) Depth

Improved Sorting Networks with O(log n) Depth
The complexity of parallel computations

The complexity of parallel computations
Simula Begin

Simula Begin

Radix sort for vector multiprocessors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
GPU-ABiSort: optimal parallel sorting on stream architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

When looking for new and faster parallel sorting algorithms for use in massively parallel systems it is tempting to investigate promising alternatives from the large body of research done on parallel sorting in the field of theoretical computer science. Such “theoretical” algorithms are mainly described for the PRAM (Parallel Random Access Machine) model of computation [13,26]. This paper shows how this kind of investigation can be done on a simple but versatile environment for programming and measuring of PRAM algorithms [19,20]. The practical value of Cole's Parallel Merge Sort algorithm [10,11] have been investigated by comparing it with Batcher's bitonic sorting [5]. The &Ogr;(log n) time consumption of Cole's algorithm implies that it must be faster than bitonic sorting which is &Ogr;(log2 n) time-if n is large enough. However, we have found that bitonic sorting is faster as long as n is less than 1.2 x 1021, i.e. more than 1 Giga Tera items!. Consequently, Cole's logarithmic time algorithm is not fast in practice.