A logarithmic time sort for linear size networks
Journal of the ACM (JACM)
Tight bounds on the complexity of parallel sorting
IEEE Transactions on Computers
Towards an architecture-independent analysis of parallel algorithms
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
On communication latency in PRAM computations
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
The APRAM: incorporating asynchrony into the PRAM model
SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Efficient parallel algorithms can be made robust
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Communication complexity of PRAMs
Theoretical Computer Science - Special issue: Fifteenth international colloquium on automata, languages and programming, Tampere, Finland, July 1988
A bridging model for parallel computation
Communications of the ACM
Efficient robust parallel computations
STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
A comparison of sorting algorithms for the connection machine CM-2
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Radix sort for vector multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Efficient PRAM simulation on a distributed memory machine
STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
The Stanford Dash Multiprocessor
Computer
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Designing broadcasting algorithms in the postal model for message-passing systems
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimal broadcast and summation in the LogP model
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
An atomic model for message-passing
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Parallelism in random access machines
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
A Mathematical Theory of Communication
A Mathematical Theory of Communication
Effective distributed scheduling of parallel workloads
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
LogP: a practical model of parallel computation
Communications of the ACM
High-performance sorting on networks of workstations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
LoPC: modeling contention in parallel algorithms
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Load balanced parallel radix sort
ICS '98 Proceedings of the 12th international conference on Supercomputing
Scheduling with implicit information in distributed systems
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
LoGPC: modeling network contention in message-passing programs
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Predictive analysis of a wavefront application using LogGP
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Compact grid layouts of multi-level networks
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Problem space promotion and its evaluation as a technique for efficient parallel computation
ICS '99 Proceedings of the 13th international conference on Supercomputing
BOS is boss: a case for bulk-synchronous object systems
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Parallel sorting on cache-coherent DSM multiprocessors
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
LoGPC: Modeling Network Contention in Message-Passing Programs
IEEE Transactions on Parallel and Distributed Systems
LogGPS: a parallel computational model for synchronization analysis
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
ACM Transactions on Computer Systems (TOCS)
Partitioned parallel radix sort
Journal of Parallel and Distributed Computing
Parallel Merge Sort with Load Balancing
International Journal of Parallel Programming
Load-Balanced Parallel Merge Sort on Distributed Memory Parallel Computers
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Partitioned Parallel Radix Sort
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Parallelizing Merge Sort onto Distributed Memory Parallel Computers
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Parallel Sorting on the NEC Cenju-3 and IBM SP2
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A Parallel Computational Model for Heterogeneous Clusters
IEEE Transactions on Parallel and Distributed Systems
Efficient sample sort and the average case analysis or PEsort
Theoretical Computer Science
Performance engineering, PSEs and the GRID
Scientific Programming
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Towards using and improving the NAS parallel benchmarks: a parallel patterns approach
Proceedings of the 2010 Workshop on Parallel Programming Patterns
Hi-index | 0.02 |
In this paper, we analyze four parallel sorting algorithms (bitonic, column, radix, and sample sort) with the LogP model. LogP characterizes the performance of modern parallel machines with a small set of parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P). We develop implementations of these algorithms in Split-C, a parallel extension to C, and compare the performance predicted by LogP to actual performance on a CM-5 of 32 to 512 processors for a range of problem sizes. We evaluate the robustness of the algorithms by varying the distribution and ordering of the key values. We also briefly examine the sensitivity of the algorithms to the communication parameters.We show that the LogP model is a valuable guide in the development of parallel algorithms and a good predictor of implementation performance. The model encourages the use of data layouts which minimize communication and balanced communication schedules which avoid contention. With an empirical model of local processor performance, LogP predictions closely match observed execution times on uniformly distributed keys across a broad range of problem and machine sizes. We find that communication performance is oblivious to the distribution of the key values, whereas the local processor performance is not; some communication phases are sensitive to the ordering of keys due to contention. Finally, our analysis shows that overhead is the most critical communication parameter in the sorting algorithms.