Profile-guided I/O partitioning

Authors:
Yijian Wang;David Kaeli
Affiliations:
Northeastern University, Boston, MA;Northeastern University, Boston, MA
Venue:
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Year:
2003

Citing 17
Cited 9

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Compiler and runtime support for out-of-core HPF programs

ICS '94 Proceedings of the 8th international conference on Supercomputing
Striping in a RAID level 5 disk array

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Informed prefetching and caching

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
File-Access Characteristics of Parallel Scientific Workloads

IEEE Transactions on Parallel and Distributed Systems
Performance analysis on a CC-NUMA prototype

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Lessons from characterizating the input/output behavior of parallel scientific applications

Performance Evaluation - Special issue on tools for performance evaluation
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
A study of I/O behavior of perfect benchmarks on a multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Compiler-based I/O prefetching for out-of-core applications

ACM Transactions on Computer Systems (TOCS)
Learning to Classify Parallel Input/Output Access Patterns

IEEE Transactions on Parallel and Distributed Systems
Overcoming the memory wall in symbolic algebra: a faster permutation multiplication

ACM SIGSAM Bulletin
Optimization of Out-of-Core Computations Using Chain Vectors

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Disk-directed I/O for an Out-of-Core Computation

Disk-directed I/O for an Out-of-Core Computation

Source level transformations to improve I/O data partitioning

SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
Developing object-oriented parallel iterative methods

International Journal of High Performance Computing and Networking
InterferenceRemoval: removing interference of disk access for MPI programs through data replication

Proceedings of the 24th ACM International Conference on Supercomputing
Histogram-based I/O optimization for visualizing large-scale data

Proceedings of the 2009 Workshop on Ultrascale Visualization
IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A cost-intelligent application-specific data layout scheme for parallel file systems

Proceedings of the 20th international symposium on High performance distributed computing
Pattern-aware file reorganization in MPI-IO

Proceedings of the sixth workshop on Parallel Data Storage
Light-Weight parallel i/o analysis at scale

EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
Cost-intelligent application-specific data layout optimization for parallel file systems

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the field of high performance computing there is a growing need to process large, complex datasets. Many of these applications are file-intensive workloads, performing a large number of reads from and writes to a small number of files. When executing these workloads on cluster-based systems, performance cannot scale by simply increasing the number of compute nodes. To effectively exploit parallel resources we need to parallelize file I/O. The potential impact of exploiting parallel I/O grows as the gap between CPU and disk speeds continues to increase.While parallel I/O middleware systems (e.g., MPI I/O) provide users with environments where large datasets can be shared among multiple distributed processes, the performance of file-intensive applications depends heavily on how the data is accessed and where the data is physically located on disk. I/O operations need to be parallelized both at the application level (using middleware) and at the disk level (using partitioning).In this paper, we present a new profile-guided greedy partitioning algorithm to parallelize I/O access for file-intensive applications run on cluster-based systems. We are using MPI and MPI I/O to provide parallelization at the application level. We utilize I/O profiling to capture relevant information about the I/O stream. We then use these profiles to guide file partitioning across multiple disks to significantly improve I/O throughput.