ACM Transactions on Computer Systems (TOCS)
Input/output behavior of supercomputing applications
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
An analytic performance model of disk arrays
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Parallel access to files in the Vesta file system
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Storage hierarchy management for scientific computing
Storage hierarchy management for scientific computing
Long term file migration: development and evaluation of algorithms
Communications of the ACM
Disk-directed I/O for MIMD Multiprocessors
Disk-directed I/O for MIMD Multiprocessors
Flexibility and performance of parallel file systems
ACM SIGOPS Operating Systems Review
SPIFFI-A Scalable Parallel File System for the Intel Paragon
IEEE Transactions on Parallel and Distributed Systems
Randomized Data Allocation for Real-time Disk I/O
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Hi-index | 0.00 |
Massively parallel file systems must provide high bandwidth file access to programs running on their machines. Most accomplish this goal by striping files across arrays of disks attached to a few specialized I/O nodes in the massively parallel processor (MPP). This arrangement requires programmers to give the file system many hints on how their data is to be laid out on disk if they want to achieve good performance. Additionally, the custom interface makes massively parallel file systems hard for programmers to use and difficult to seamlessly integrate into an environment with workstations and tertiary storage. The RAMA file system addresses these problems by providing a massively parallel file system that does not need user hints to provide good performance. RAMA takes advantage of the recent decrease in physical disk size by assuming that each processor in an MPP has one or more disks attached to it. Hashing is then used to pseudo-randomly distribute data to all of these disks, insuring high bandwidth regardless of access pattern. Since MPP programs often have many nodes accessing a single file in parallel, the file system must allow access to different parts of the file without relying on a particular node. In RAMA, a file request involves only two nodes -- the node making the request and the node on whose disk the data is stored. Thus, RAMA scales well to hundreds of processors. Since RAMA needs no layout hints from applications, it fits well into systems where users cannot (or will not) provide such hints. Fortunately, this flexibility does not cause a large loss of performance. RAMA's simulated performance is within 10-15% of the optimum performance of a similarly-sized striped file system, and is a factor of 4 or more better than a striped file system with poorly laid out data.