An Evaluation of Multiple-Disk I/O Systems
IEEE Transactions on Computers
The interaction of architecture and operating system design
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
NUMA policies and their relation to memory architecture
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance of a disk array protype
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Analysis of file I/O traces in commercial computing environments
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Architecture support for single address space operating systems
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The process-flow model: examining I/O performance from the system's point of view
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The impact of operating system structure on memory system performance
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Input/output characteristics of scalable parallel applications
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Application and architectural bottlenecks in large scale distributed shared memory machines
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
High-performance sorting on networks of workstations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Proceedings of the 25th annual international symposium on Computer architecture
A case for compositional file systems (extended abstract)
ACM SIGOPS Operating Systems Review
Operating system support for database management
Communications of the ACM
File-System Workload on a Scientific Multiprocessor
IEEE Parallel & Distributed Technology: Systems & Technology
An Evaluation of a Commercial CC-NUMA Architecture: The CONVEX Exemplar SPP1200
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Fine-Grain Software Distributed Shared Memory on SMP Clusters
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hfs: a flexible file system for shared-memory multiprocessors
Hfs: a flexible file system for shared-memory multiprocessors
Experience with a language for writing coherence protocols
DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
An Adaptive Parallel Distributive Join Algorithm on a Cluster of Workstations
The Journal of Supercomputing
Hi-index | 0.00 |
This paper presents a unified evaluation of the I/O behavior of a commercial clustered DSM machine, the HP Exemplar. Our study has the following objectives: 1) To evaluate the impact of different interacting system components, namely, architecture, operating system, and programming model, on the overall I/O behavior and identify possible performance bottlenecks, and 2) To provide hints to the users for achieving high out-of-box I/O throughput. We find that for the DSM machines that are built as a cluster of SMP nodes, integrated clustering of computing and I/O resources, both hardware and software, is not advantageous for two reasons. First, within an SMP node, the I/O bandwidth is often restricted by the performance of the peripheral components and cannot match the memory bandwidth. Second, since the I/O resources are shared as a global resource, the file-access costs become nonuniform and the I/O behavior of the entire system, in terms of both scalability and balance, degrades.We observe that the buffered I/O performance is determined not only by the I/O subsystem, but also by the programming model, global-shared memory subsystem, and data-communication mechanism. Moreover, programming-model support can be used effectively to overcome the performance constraints created by the architecture and operating system. For example, on the HP Exemplar, users can achieve high I/O throughput by using features of the programming model that balance the sharing and locality of the user buffers and file systems. Finally, we believe that at present, the I/O subsystems are being designed in isolation, and there is a need for mending the traditional memory-oriented design approach to address this problem.