The NetLogger Methodology for High Performance Distributed Systems Performance Analysis

Authors:
Brian Tierney;William Johnston;Brian Crowley;Gary Hoo;Chris Brooks;Dan Gunter
Affiliations:
-;-;-;-;-;-
Venue:
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Year:
1998

Citing 0
Cited 28

Adaptive performance prediction for distributed data-intensive applications

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A scalable SNMP-based distibuted monitoring system for heterogeneous network computing

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Using high-speed WANs and network data caches to enable remote and distributed visualization

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Distributed computing research issues in grid computing

ACM SIGACT News
A Monitoring Sensor Management System for Grid Environments

Cluster Computing
The MAGNeT Toolkit: Design, Implementation and Evaluation

The Journal of Supercomputing
Treemaps for Workload Visualization

IEEE Computer Graphics and Applications
Object-Centered Constraints

Proceedings of the Seventh International Conference on Data Engineering
WEBARM: Mobile Code Based Agent for Web Application Response Measurement - Software Implementations and Analysis

MMNS '01 Proceedings of the 4th IFIP/IEEE International Conference on Management of Multimedia Networks and Services: Management of Multimedia on the Internet
An Infrastructure for Monitoring and Management in Computational Grids

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A TCP tuning daemon

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Monitoring data archives for grid environments

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Grid high performance networking in the DataGRID project

Future Generation Computer Systems - Special section: Selected papers from the TERENA networking conference 2002
Dynamic Monitoring of High-Performance Distributed Applications

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
GridMapper: A Tool for Visualizing the Behavior of Large-Scale Distributed Systems

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
On-Demand Grid Application Tuning and Debugging with the NetLogger Activation Service

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Enabling Network Measurement Portability Through a Hierarchy of Characteristics

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
IMPuLSE: integrated monitoring and profiling for large-scale environments

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Grid Network Monitoring in the European Datagrid Project

International Journal of High Performance Computing Applications
WAP5: black-box performance debugging for wide-area systems

Proceedings of the 15th international conference on World Wide Web
Certificate-based access control for widely distributed resources

SSYM'99 Proceedings of the 8th conference on USENIX Security Symposium - Volume 8
Scalability analysis of three monitoring and information systems: MDS2, R-GMA, and Hawkeye

Journal of Parallel and Distributed Computing
mBrace: action-based performance monitoring of multi-tier web applications

Proceedings of the Third Workshop on Dependable Distributed Data Management
Archive migration through workflow automation

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Distributed general logging architecture for grid environments

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Online event correlations analysis in system logs of large-scale cluster systems

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
DeWiz - event-based debugging on the grid

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a methodology that enables the real-time diagnosis of performance problems in complex high-performance distributed systems. The methodology includes tools for generating precision event logs that can be used to provide detailed end-to-end application and system level monitoring; a Java agent-based system for managing the large amount of logging data; and tools for visualizing the log data and real-time state of the distributed system. We developed these tools for analyzing a high-performance distributed system centered around the transfer of large amounts of data at high speeds from a distributed storage server to a remote visualization client. However, this methodology should be generally applicable to any distributed system.This methodology, called NetLogger, has proven invaluable for diagnosing problems in networks and in distributed systems code. This approach is novel in that it combines network, host, and application-level monitoring, providing a complete view of the entire system.