Communications of the ACM
OCM—a monitoring system for interoperable tools
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
MIMO — An infrastructure for monitoring and managing distributed middleware environments
IFIP/ACM International Conference on Distributed systems platforms
A Framework for an Interoperable Tool Environment (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
OMIS 2.0 - A Universal Interface for Monitoring Systems
Proceedings of the 4th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
Resource management systems and tool support are two important factors for efficiently developing applications in large clusters. On the one hand, management systems (in the form of batch queue systems) are responsible for all issues related to executing jobs on the existing machines. On the other hand, run-time tools (in the form of debuggers, tracers, performance analyzers, etc.) are used to guarantee the correctness and the efficiency of execution. Executing an application under the control of both a resource management system and a run-time tool is still a challenging problem in most cases. Using run-time tools might be difficult or even impossible in usual environments due to the restrictions imposed by resource managers. We propose TDP-Shell as a framework for providing the necessary mechanisms to enable and simplify using run-time tools under a specific resource management system. We have analyzed the essential interactions between common run-time tools and resource management systems and implemented a pilot TDP-Shell. The paper describes the main components of TDP-Shell and its use with some illustrative examples.