A logarithmic time sort for linear size networks
Journal of the ACM (JACM)
Vector models for data-parallel computing
Vector models for data-parallel computing
A parallel quicksort algorithm
Journal of Parallel and Distributed Computing
Executing multithreaded programs efficiently
Executing multithreaded programs efficiently
Introspective sorting and selection algorithms
Software—Practice & Experience
A randomized parallel sorting algorithm with an experimental study
Journal of Parallel and Distributed Computing
SPL: a language and compiler for DSP algorithms
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
An Introduction to Neural Networks
An Introduction to Neural Networks
A Dynamically Tuned Sorting Library
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Optimizing Sorting with Genetic Algorithms
Proceedings of the international symposium on Code generation and optimization
Sorting networks and their applications
AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
Bitonic sort on a chained-cubic tree interconnection network
Journal of Parallel and Distributed Computing
The Journal of Supercomputing
Hi-index | 0.00 |
Empirical search is an emerging strategy used in systems like ATLAS, FFTW and SPIRAL to find the parameter values of the implementation that deliver near-optimal performance for a particular machine. However, this approach has only proven successful for scientific kernels or serial symbolic sorting. Even commercial libraries like Intel MKL or IBM ESSL do not include parallel version of sorting routines. In this paper we study empirical search in the generation of parallel sorting routines for multi-core systems. Parallel sorting presents new challenges that the relative performance of the algorithms depends not only on the characteristics of the architectures and input data, but also on the data partitioning schemes and thread interactions. We have studied parallel sorting algorithms including quick sort, cache-conscious radix sort, multiway merge sort, sample sort and quick-radix sort, and have built a sorting library using empirical search and artificial neural network. Our results show that this sorting library could generate the best parallel sorting algorithms for different input sets on both x86 and SPARC multicore architectures, with a peak speedup of 2.2x and 3.9x, respectively.