Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler blockability of numerical algorithms
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
Disk-directed I/O for MIMD Multiprocessors
Disk-directed I/O for MIMD Multiprocessors
Improving the performance of virtual memory computers.
Improving the performance of virtual memory computers.
Tuning the performance of I/O-intensive parallel applications
Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
An interprocedural framework for placement of asynchronous I/O operations
ICS '96 Proceedings of the 10th international conference on Supercomputing
An I/O network architecture of the distributed shared-memory massively parallel computer JUMP-1
ICS '97 Proceedings of the 11th international conference on Supercomputing
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the fifth workshop on I/O in parallel and distributed systems
A General Interprocedural Framework for Placement of Split-Phase Large Latency Operations
IEEE Transactions on Parallel and Distributed Systems
Compiling object-oriented data intensive applications
Proceedings of the 14th international conference on Supercomputing
A novel application development environment for large-scale scientific computations
Proceedings of the 14th international conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Compiler-Directed Collective-I/O
IEEE Transactions on Parallel and Distributed Systems
An Experimental Evaluation of I/O Optimizations on Different Applications
IEEE Transactions on Parallel and Distributed Systems
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets
The Journal of Supercomputing
Data parallel language and compiler support for data intensive applications
Parallel Computing - Parallel data-intensive algorithms and applications
An Experimental Evaluation of I/O Optimizations on Different Applications
IEEE Transactions on Parallel and Distributed Systems
Compiler-Directed I/O Optimization
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Compiling Data Intensive Applications with Spatial Coordinates
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Parallel Input/Output with Heterogeneous Disks
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
A Collective I/O Scheme Based on Compiler Analysis
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Toward Compiler Support for Scalable Parallelism Using Multipartitioning
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Disk Resident Arrays: An Array-Oriented I/O Library for Out-Of-Core Computations
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Sourcebook of parallel computing
Performance modeling and optimization of parallel out-of-core tensor contractions
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The MHETA Execution Model for Heterogeneous Clusters
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Discretionary Caching for I/O on Clusters
Cluster Computing
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
A parallel dynamic programming algorithm on a multi-core architecture
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Compiler and middleware support for scalable data mining
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Computation mapping for multi-level storage cache hierarchies
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Cashing in on hints for better prefetching and caching in PVFS and MPI-IO
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Optimal multi-image processing streaming framework on parallel heterogeneous systems
EG PGV'11 Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization
Compiler-directed file layout optimization for hierarchical storage systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compiler-directed file layout optimization for hierarchical storage systems
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage model that allows processors independent access to their own data and a corresponding compilation strategy that integrates data-parallel computation with data distribution for out-of-core problems. Our results compare several communication methods and I/O optimizations using two out-of-core problems, Jacobi iteration and LU factorization.