Permuting data on random-access block storage

Authors:
Risi Thonangi;Jun Yang
Affiliations:
Duke University;Duke University
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 11
Cited 0

Virtual memory for data-parallel computing

Virtual memory for data-parallel computing
Efficient transposition algorithms for large matrices

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
An Efficient Algorithm for Out-of-Core Matrix Transposition

IEEE Transactions on Computers
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Performance issues of multi-dimensional data analysis

Performance issues of multi-dimensional data analysis
A Fast Computer Method for Matrix Transposing

IEEE Transactions on Computers
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware

ICDE '13 Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Permutation is a fundamental operator for array data, with applications in, for example, changing matrix layouts and reorganizing data cubes. We consider the problem of permuting large quantities of data stored on secondary storage that supports fast random block accesses, such as solid state drives and distributed key-value stores. Faster random accesses open up interesting new opportunities for permutation. While external merge sort has often been used for permutation, it is an overkill that fails to exploit the property of permutation fully and carries unnecessary overhead in storing and comparing keys. We propose faster algorithms with lower memory requirements for a large, useful class of permutations. We also tackle practical challenges that traditional permutation algorithms have not dealt with, such as exploiting random block accesses more aggressively, considering the cost asymmetry between reads and writes, and handling arbitrary data dimension sizes (as opposed to perfect powers often assumed by previous work). As a result, our algorithms are faster and more broadly applicable.