Radix sort for vector multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
On computing the fast Fourier transform
Communications of the ACM
Optimizing Parallel Bitonic Sort
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
CellSort: high performance sorting on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor
International Journal of Parallel Programming
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
Bitonic sort in shared SIMD array processor
Proceedings of the 2011 International Conference on Communication, Computing & Security
Fast in-place, comparison-based sorting with CUDA: a study with bitonic sort
Concurrency and Computation: Practice & Experience
Bitonic Sorting on Dynamically Reconfigurable Architectures
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
A high-performance sorting algorithm for multicore single-instruction multiple-data processors
Software—Practice & Experience
A Novel Sorting Algorithm for Many-core Architectures Based on Adaptive Bitonic Sort
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Hi-index | 0.00 |
Embedded Parallel computing architecture with Unique Memory Access (ePUMA) is a domain-specific embedded heterogeneous 9-core chip multiprocessor, which has a unique design with low power and high silicon efficiency for high-throughput DSP in emerging telecommunication and multimedia applications. Sorting is one of the most widely studied algorithms, more embedded applications also need efficient sorting. This paper proposes an efficient bitonic sorting algorithm eSORT for the novel ePUMA DSP. eSORT algorithm consists of two parts: an in-core sorting algorithm and an intra-core sorting algorithm. Both algorithms are adapted to the novel architecture and take advantage of the ePUMA platform. This paper implemented and evaluated the eSORT for variable datasets on ePUMA multi-core DSP and compared its performance with the Cell BE processors with the same SIMD parallelization structure. Results show that bitonic sort on ePUMA multi-core DSP has much better performance and scalability. Compared with optimized bitonic sort on Cell BE, the in-core sort is 11 times faster and intra-core sort is 15 times faster in average.