Server-directed collective I/O in Panda
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
IEEE Transactions on Parallel and Distributed Systems
Flash code: studying astrophysical thermonuclear flashes
Computing in Science and Engineering
Incremental Recovery in Main Memory Database Systems
IEEE Transactions on Knowledge and Data Engineering
MTIO - A Multi-Threaded Parallel I/O System
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Software and the Concurrency Revolution
Queue - Multiprocessors
C-CORE: Using Communication Cores for High Performance Network Services
NCA '05 Proceedings of the Fourth IEEE International Symposium on Network Computing and Applications
FreeLoader: Scavenging Desktop Storage Resources for Scientific Data
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
High-Level Buffering for Hiding Periodic Output Cost in Scientific Simulations
IEEE Transactions on Parallel and Distributed Systems
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
Log-based architectures for general-purpose monitoring of deployed code
Proceedings of the 1st workshop on Architectural and system support for improving software dependability
Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Parallel computing on any desktop
Communications of the ACM - ACM's plan to go online first
ZOID: I/O-forwarding infrastructure for petascale architectures
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Parallelizing security checks on commodity hardware
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Timely offloading of result-data in HPC centers
Proceedings of the 22nd annual international conference on Supercomputing
stdchk: A Checkpoint Storage System for Desktop Grid Computing
ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Performance analysis and visualization tools for cell/B.E. multicore environment
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Celling SHIM: compiling deterministic concurrency to a heterogeneous multicore
Proceedings of the 2009 ACM symposium on Applied Computing
Supporting MapReduce on large-scale asymmetric multi-core clusters
ACM SIGOPS Operating Systems Review
A multigrain Delaunay mesh generation method for multicore SMT-based architectures
Journal of Parallel and Distributed Computing
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Novel approaches to parallel H.264 decoder on symmetric multicore systems
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Adaptable, metadata rich IO methods for portable high performance IO
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Dynamic Job Scheduling on Heterogeneous Clusters
ISPDC '09 Proceedings of the 2009 Eighth International Symposium on Parallel and Distributed Computing
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
Designing Accelerator-Based Distributed Systems for High Performance
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Platform level support for high throughput edge applications: the Twin Cities prototype
IEEE Network: The Magazine of Global Internetworking
Combining in-situ and in-transit processing to enable extreme-scale scientific analysis
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Scaling computations on emerging massive-core supercomputers is a daunting task, which coupled with the significantly lagging system I/O capabilities exacerbates applications' end-to-end performance. The I/O bottleneck often negates potential performance benefits of assigning additional compute cores to an application. In this paper, we address this issue via a novel functional partitioning (FP) runtime environment that allocates cores to specific application tasks -- checkpointing, de-duplication, and scientific data format transformation -- so that the deluge of cores can be brought to bear on the entire gamut of application activities. The focus is on utilizing the extra cores to support HPC application I/O activities and also leverage solid-state disks in this context. For example, our evaluation shows that dedicating 1 core on an oct-core machine for checkpointing and its assist tasks using FP can improve overall execution time of a FLASH benchmark on 80 and 160 cores by 43.95% and 41.34%, respectively.