Global arrays: a nonuniform memory access programming model for high-performance computers
The Journal of Supercomputing
T2: a customizable parallel database for multi-dimensional data
ACM SIGMOD Record
The multidimensional database system RasDaMan
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining
Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
Multidimensional Database Technology
Computer
An extendible multidimensional array system for MOLAP
Proceedings of the 2006 ACM symposium on Applied computing
MAD skills: new analysis practices for big data
Proceedings of the VLDB Endowment
A demonstration of SciDB: a science-oriented DBMS
Proceedings of the VLDB Endowment
Skew-resistant parallel processing of feature-extracting scientific user-defined functions
Proceedings of the 1st ACM symposium on Cloud computing
Overview of sciDB: large scale array storage, processing and analysis
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Scalable clustering algorithm for N-body simulations in a shared-nothing cluster
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
ArrayStore: a storage manager for complex parallel array processing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Distribution rules for array database queries
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
ArrayStore: a storage manager for complex parallel array processing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
SIDR: structure-aware intelligent data routing in Hadoop
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Whether in business or science, multi-dimensional arrays are a common abstraction in data analytics and many systems exist for efficiently processing arrays. As dataset grow in size, it is becoming increasingly important to process these arrays in parallel. In this paper, we discuss different types of array operations and review how they can be processed in parallel using two different existing techniques. The first technique, which we call merge, consists in partitioning an array, processing the partitions in parallel, then merging the results to reconcile computations that span partition boundaries. The second technique, which we call overlap, consists in partitioning an array into subarrays that overlap by a given number of cells along each dimension. Thanks to this overlap, the array partitions can be processed in parallel without any merge phase. We discuss when each technique can be applied to an array operation. We show that even for a single array operation, a different approach may yield the best performance for different regions of an array. Following this observation, we introduce a new parallel array processing technique that combines the merge and overlap approaches. Our technique enables a parallel array processing system to mix-and-match the merge and overlap techniques within a single operation on an array. Through experiments on real, scientific data, we show that this hybrid approach outperforms the other two techniques.