A case for virtual machine based fault injection in a high-performance computing environment
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
A Scalable Parallel Debugging Library with Pluggable Communication Protocols
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Runtime MPI collective checking with tree-based overlay networks
Proceedings of the 20th European MPI Users' Group Meeting
Distributed wait state tracking for runtime MPI deadlock detection
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
The Scalable Tools Communication Infrastructure (STCI) is an open source collaborative effort intended to provide high-performance, scalable, resilient, and portable communications and process control services for a wide variety of user and system tools. STCI is aimed specifically at tools for ultrascale computing and uses a component architecture to simplify tailoring the infrastructure to a wide range of scenarios. This paper describes STCI's design philosophy, the various components that will be used to provide an STCI implementation for a range of ultrascale platforms, and a range of tool types. These include tools supporting parallel run-time environments, such as MPI, parallel application correctness tools and performance analysis tools, as well as system monitoring and management tools.