A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
The input/output complexity of sorting and related problems
Communications of the ACM
Optimal disk I/O with parallel block transfer
STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
Journal of Parallel and Distributed Computing - Special issue on parallel I/O systems
Virtual memory for data-parallel computing
Virtual memory for data-parallel computing
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
External memory algorithms and data structures: dealing with massive data
ACM Computing Surveys (CSUR)
Introduction to Algorithms
Towards a theory of cache-efficient algorithms
Journal of the ACM (JACM)
On the limits of cache-obliviousness
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Towards an Optimal Bit-Reversal Permutation Program
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
The Grid 2: Blueprint for a New Computing Infrastructure
The Grid 2: Blueprint for a New Computing Infrastructure
Hierarchical memory with block transfer
SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Hi-index | 5.23 |
Permuting a vector is a fundamental primitive which arises in many applications. In particular, rational permutations, which are defined by permutations of the bits of the binary representations of the vector indices, are widely used. Matrix transposition and bit-reversal are notable examples of rational permutations. In this paper we contribute a number of results regarding the execution of these permutations in cache hierarchies, with particular emphasis on the cache-oblivious setting. We first bound from below the work needed to execute a rational permutation with an optimal cache complexity. Then, we develop a cache-oblivious algorithm to perform any rational permutation, which exhibits optimal work and cache complexities under the tall cache assumption. We finally show that for certain families of rational permutations (including matrix transposition and bit reversal) no cache-oblivious algorithm can exhibit optimal cache complexity for all values of the cache parameters. This latter result specializes the one proved by Brodal and Fagerberg for general permutations to the case of rational permutations, and provides further evidence that the tall cache assumption is often necessary to attain cache optimality in the context of cache-oblivious algorithms.