A Survey of Distributed Enterprise Network andSystems Management Paradigms
Journal of Network and Systems Management
Using process technology to control and coordinate software adaptation
Proceedings of the 25th International Conference on Software Engineering
Matchmaking: Distributed Resource Management for High Throughput Computing
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Grid Information Services for Distributed Resource Sharing
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Efficient Hierarchic Management For Reconfiguration of Networked Information Systems
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Hi-index | 0.00 |
As Grids become increasingly relied upon as critical infrastructure, it is imperative to ensure the highly-available and secure day-to-day operation of the Grid infrastructure. The current approach for Grid management is generally to have geographically-distributed system administrators contact each other by phone or email to debug Grid behavior and subsequently modify or reconfigure the deployed Grid software. For security-related events such as the required patching of vulnerable Grid software, this ad hoc process can take too much time, is error-prone and tedious, and thus is unlikely to completely solve the problems. In this paper, we present the application of the ANDREA management system to control Grid service functionality in near-real-time at scales of thousands of services with minimal human involvement. We show how ANDREA can be used to better ensure the security of the Grid: In experiments using 11,394 Globus Toolkit v4 deployments we show the performance of ANDREA for three increasingly-sophisticated reactions to an intruder detection: shutting down the entire Grid; incrementally eliminating Grid service for different classes of users; and issuing and applying a patch to the vulnerability exploited by the attacker. We believe that this work is an important first step toward automating the general day-to-day monitoring and reconfiguration of all aspects of Grid deployments.