Adaptive performance prediction for distributed data-intensive applications
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A scalable SNMP-based distibuted monitoring system for heterogeneous network computing
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Using high-speed WANs and network data caches to enable remote and distributed visualization
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Distributed computing research issues in grid computing
ACM SIGACT News
A Monitoring Sensor Management System for Grid Environments
Cluster Computing
The MAGNeT Toolkit: Design, Implementation and Evaluation
The Journal of Supercomputing
Treemaps for Workload Visualization
IEEE Computer Graphics and Applications
Proceedings of the Seventh International Conference on Data Engineering
MMNS '01 Proceedings of the 4th IFIP/IEEE International Conference on Management of Multimedia Networks and Services: Management of Multimedia on the Internet
An Infrastructure for Monitoring and Management in Computational Grids
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Monitoring data archives for grid environments
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Grid high performance networking in the DataGRID project
Future Generation Computer Systems - Special section: Selected papers from the TERENA networking conference 2002
Dynamic Monitoring of High-Performance Distributed Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
GridMapper: A Tool for Visualizing the Behavior of Large-Scale Distributed Systems
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
On-Demand Grid Application Tuning and Debugging with the NetLogger Activation Service
GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Enabling Network Measurement Portability Through a Hierarchy of Characteristics
GRID '03 Proceedings of the 4th International Workshop on Grid Computing
IMPuLSE: integrated monitoring and profiling for large-scale environments
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Grid Network Monitoring in the European Datagrid Project
International Journal of High Performance Computing Applications
WAP5: black-box performance debugging for wide-area systems
Proceedings of the 15th international conference on World Wide Web
Certificate-based access control for widely distributed resources
SSYM'99 Proceedings of the 8th conference on USENIX Security Symposium - Volume 8
Scalability analysis of three monitoring and information systems: MDS2, R-GMA, and Hawkeye
Journal of Parallel and Distributed Computing
mBrace: action-based performance monitoring of multi-tier web applications
Proceedings of the Third Workshop on Dependable Distributed Data Management
Archive migration through workflow automation
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Distributed general logging architecture for grid environments
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Online event correlations analysis in system logs of large-scale cluster systems
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
DeWiz - event-based debugging on the grid
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Hi-index | 0.00 |
We describe a methodology that enables the real-time diagnosis of performance problems in complex high-performance distributed systems. The methodology includes tools for generating precision event logs that can be used to provide detailed end-to-end application and system level monitoring; a Java agent-based system for managing the large amount of logging data; and tools for visualizing the log data and real-time state of the distributed system. We developed these tools for analyzing a high-performance distributed system centered around the transfer of large amounts of data at high speeds from a distributed storage server to a remote visualization client. However, this methodology should be generally applicable to any distributed system.This methodology, called NetLogger, has proven invaluable for diagnosing problems in networks and in distributed systems code. This approach is novel in that it combines network, host, and application-level monitoring, providing a complete view of the entire system.