Agile application-aware adaptation for mobility
Proceedings of the sixteenth ACM symposium on Operating systems principles
Tools for application-oriented performance tuning
ICS '01 Proceedings of the 15th international conference on Supercomputing
Dynamic statistical profiling of communication activity in distributed applications
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A network-failure-tolerant message-passing system for terascale clusters
ICS '02 Proceedings of the 16th international conference on Supercomputing
dproc - Extensible Run-Time Resource Monitoring for Cluster Applications
ICCS '02 Proceedings of the International Conference on Computational Science-Part II
Think: A Software Framework for Component-based Operating System Kernels
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Instrumenting LogP Parameters in GM: Implementation and Validation
LCN '02 Proceedings of the 27th Annual IEEE Conference on Local Computer Networks
Supermon: A High-Speed Cluster Monitoring System
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
The NetLogger Methodology for High Performance Distributed Systems Performance Analysis
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
On the Appropriateness of Commodity Operating Systems for Large-Scale, Balanced Computing Systems
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Supporting Coordinated Adaption in Networked Systems
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Comparing Passive Network Monitoring of Grid Application Traffic with Active Probes
GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Magpie: online modelling and performance-aware systems
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Measuring and characterizing system behavior using kernel-level event logging
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Light-weight application monitoring and tuning with embedded gossip
ACM SIGMETRICS Performance Evaluation Review
Hi-index | 0.00 |
A lack of efficient system software is an increasing impediment to deploying large-scale parallel and distributed systems. Systemically addressing operating system-induced performance anomalies requires accurate, low-overhead, whole-system monitoring, something that is currently unavailable in large tightly-coupled systems. In this paper, we present the design of IMPuLSE---Integrated Monitoring and Profiling for Large-Scale Environments---a system we are developing to meet this need. IMPuLSE's innovative message-centric profiling approach trades off of centralized global knowledge for low overhead, while retaining relatively fine-grained information about important cross-host performance interactions. The goal of this approach is to enable both large-scale system software adaptation and continuous system performance auditing.