ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
EEL: machine-independent executable editing
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The MOSIX multicomputer operating system for high performance cluster computing
Future Generation Computer Systems - Special issue on HPCN '97
Fine-grained dynamic instrumentation of commodity operating system kernels
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Linger Longer: fine-grain cycle stealing for networks of workstations
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
GILK: A Dynamic Instrumentation Tool for the Linux Kernel
TOOLS '02 Proceedings of the 12th International Conference on Computer Performance Evaluation, Modelling Techniques and Tools
Self-Monitoring and Self-Adapting Operating Systems
HOTOS '97 Proceedings of the 6th Workshop on Hot Topics in Operating Systems (HotOS-VI)
Fast concurrent dynamic linking for an adaptive operating system
ICCDS '96 Proceedings of the 3rd International Conference on Configurable Distributed Systems
Enabling autonomic behavior in systems software with hot swapping
IBM Systems Journal
Providing dynamic update in an operating system
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Hi-index | 0.00 |
High-performance computing clusters running long-lived tasks currently cannot have kernel software updates applied to them without causing system downtime. These clusters miss opportunities for increased performance via specialized kernel support, cannot benefit from new kernel features, and continue to operate with kernel security holes unpatched, at least until the next scheduled maintenance date. We developed a system enabling dynamic kernel updates in parallel computing clusters to address these problems. Our system, DynAMOS, is founded on execution flow highjacking through function cloning. It enables commodity operating systems popularly used in clusters gain adaptive and mutative capabilities. To demonstrate the efficacy of our system, we illustrate our experience in dynamically updating and extending a Linux cluster. We introduce adaptive memory paging for efficient gang-scheduling, extend the kernel's process scheduler to support unobtrusive fine-grain cycle stealing, apply public security fixes, and inject performance monitoring functionality to a selection of kernel functions. Our benchmarks show that the overhead imposed by DynAMOS is mostly in the range of 1-8% for common Linux kernel functions.