Resource oblivious sorting on multicores

Authors:
Richard Cole;Vijaya Ramachandran
Affiliations:
Computer Science Dept., Courant Institute, NYU, New York, NY;Dept. of Computer Science, Univ. of Texas, Austin, TX
Venue:
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Year:
2010

Citing 9
Cited 5

The input/output complexity of sorting and related problems

Communications of the ACM
Parallel merge sort

SIAM Journal on Computing
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Executing functional programs on a virtual tree of processors

FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Provably good multicore cache performance for divide-and-conquer algorithms

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Fundamental parallel algorithms for private-cache chip multiprocessors

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
A Bridging Model for Multi-core Computing

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Brief announcement: low depth cache-oblivious sorting

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures

Scheduling irregular parallel computations on hierarchical caches

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Revisiting the cache miss analysis of multithreaded algorithms

LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Brief announcement: efficient cache oblivious algorithms for randomized divide-and-conquer on the multicore model

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
A parallel buffer tree

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Work-stealing with configurable scheduling strategies

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new deterministic sorting algorithm that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(n log n) time cache-obliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log n log log n), which improves on previous bounds for deterministic sample sort. Given a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, our algorithm can be scheduled effectively on these p cores in a cache-oblivious manner. We improve on the above cache-oblivious processor-aware parallel implementation by using the Priority Work Stealing Scheduler (PWS) that we presented recently in a companion paper [12]. The PWS scheduler is both processor- and cache-oblivious (i.e., resource oblivious), and it tolerates asynchrony among the cores. Using PWS, we obtain a resource oblivious scheduling of our sorting algorithm that matches the performance of the processor-aware version. Our analysis includes the delay incurred by false-sharing. We also establish good bounds for our algorithm with the randomized work stealing scheduler.