CCS Resource Management in Networked HPC Systems

Authors:
Axel Keller;Alexander Reinefeld
Affiliations:
-;-
Venue:
HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
Year:
1998

Citing 19
Cited 3

Metacomputing

Communications of the ACM
Metasystems: an approach combining parallel processing and heterogeneous distributed computing systems

Journal of Parallel and Distributed Computing - Special issue on heterogeneous processing
A worldwide flock of Condors: load sharing among workstation clusters

Future Generation Computer Systems - Special issue: resource management in distributed systems
MARS—a framework for minimizing the job execution time in a metacomputing environment

Future Generation Computer Systems - Special issue: resource management in distributed systems
Managing multiple communication methods in high-performance networked computing systems

Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
From the I-WAY to the National Technology Grid

Communications of the ACM
Heterogeneous Message Passing and a Link to Resource Management

The Journal of Supercomputing - Special issue: high performance computing systems
Application-level scheduling on distributed heterogeneous networks

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Batrun: Utilizing Idle Workstations for Large-Scale Computing

IEEE Parallel & Distributed Technology: Systems & Technology
Piranha: A CORBA Tool For High Availability

Computer
EUROPORT - ESPRIT European Porting Projects

HPCN Europe 1994 Proceedings of the nternational Conference and Exhibition on High-Performance Computing and Networking Volume I: Applications
Architecture-Independent Request-Scheduling with Tight Waiting-Time Estimations

IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
PHASE and MICA: Application Specific Metacomputing

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
A General Purpose Resource Description Language

TAT '91 Parallele Datenverarbeitung mit dem Transputer, 3. Transputer-Anwender-Treffen
A Distributed Computing Center Software for the Efficient Use of Parallel Computer Systems

HPCN Europe 1994 Proceedings of the nternational Conference and Exhibition on High-Performance Computing and Networking Volume II: Networking and Tools
The MOL project: an open, extensible metacomputer

HCW '97 Proceedings of the 6th Heterogeneous Computing Workshop (HCW '97)
Nimrod: a tool for performing parametrised simulations using distributed workstations

HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Software infrastructure for the I-WAY high-performance distributed computing experiment

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
A Directory Service for Configuring High-Performance Distributed Computations

HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing

On Job Scheduling for HPC-Clusters and the dynP Scheduler

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
The Self-Tuning dynP Job-Scheduler

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Encouraging the Unexpected: Cluster Management for OS and Systems Research (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

CCS is a resource management system for parallel high-performance computers. At the user level, CCS provides vendor-independent access to parallel systems. At the system administrator level, CCS offers tools for controlling (i.e. specifying, configuring and scheduling) the system components that are operated in a computing center. Hence the name Computing Center Software. CCS provides: hardware-independent scheduling of interactive and batch jobs, partitioning of exclusive and non-exclusive resources, open, extensible interfaces to other resource management systems, a high degree of reliability (e.g. automatic restart of crashed daemons), fault tolerance in the case of network breakdowns. In this paper, we describe CCS as one important component for the access, job distribution, and administration of networked HPC systems in a metacomputing environment.