Automatic array alignment in data-parallel programs
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimal evaluation of array expressions on massively parallel machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compilation and delayed evaluation in APL
POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Performance optimization of a class of loops implementing multidimensional integrals
Performance optimization of a class of loops implementing multidimensional integrals
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Memory-optimal evaluation of expression trees involving large objects
Computer Languages, Systems and Structures
Journal of Parallel and Distributed Computing
Memory-constrained communication minimization for a class of array computations
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
Multi-dimensional integrals of products of several arrays arise in certain scientific computations. In the context of these integral calculations, this paper addresses a memory usage minimization problem. Based on a framework that models the relationship between loop fusion and memory usage, we propose an algorithm for finding a loop fusion configuration that minimizes memory usage. A practical example shows the performance improvement obtained by our algorithm on an electronic structure computation.