MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools

Authors:
Philip C. Roth;Dorian C. Arnold;Barton P. Miller
Affiliations:
University of Wisconsin, Madison;University of Wisconsin, Madison;University of Wisconsin, Madison
Venue:
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Year:
2003

Citing 15
Cited 45

Optimal broadcast and summation in the LogP model

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Adaptive parallel aggregation algorithms

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
LogP: a practical model of parallel computation

Communications of the ACM
LogGP: incorporating long messages into the LogP model for parallel computation

Journal of Parallel and Distributed Computing
Modeling and Evaluating Design Alternatives for an On-Line Instrumentation System: A Case Study

IEEE Transactions on Software Engineering
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Semicoarsening Multigrid on Distributed Memory Machines

SIAM Journal on Scientific Computing
Automatically tuned collective communications

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
The Paradyn Parallel Performance Measurement Tool

Computer
A New Approach to Parallel Debugger Architecture

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Supermon: A High-Speed Cluster Monitoring System

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Lilith: Scalable Execution of User Code for Distributed Computing

HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
MDL: A Language And Compiler For Dynamic Program Instrumentation

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading

The Tool Dæmon Protocol (TDP)

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A taxonomy of grid monitoring systems

Future Generation Computer Systems
On-line automated performance diagnosis on thousands of processes

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable dynamic binary instrumentation for Blue Gene/L

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Scalable, fault tolerant membership for MPI tasks on HPC systems

Proceedings of the 20th annual international conference on Supercomputing
Dynamic binary instrumentation and data aggregation on large scale systems

International Journal of Parallel Programming
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Characterizing the I/O behavior of scientific applications on the Cray XT

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Preserving time in large-scale communication traces

Proceedings of the 22nd annual international conference on Supercomputing
Open | SpeedShop: An open source infrastructure for parallel performance analysis

Scientific Programming - Large-Scale Programming Tools and Environments
Lessons learned at 208K: towards debugging millions of cores

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scalable load-balance measurement for SPMD codes

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Automatic Memory Access Analysis with Periscope

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Observing Performance Dynamics Using Parallel Profile Snapshots

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

Journal of Parallel and Distributed Computing
A scalable tool architecture for diagnosing wait states in massively parallel applications

Parallel Computing
Scalable temporal order analysis for large scale debugging

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A taxonomy of grid monitoring systems

Future Generation Computer Systems
Scalable dynamic Monitoring, Analysis and Tuning Environment for parallel applications

Journal of Parallel and Distributed Computing
Monitoring MPI programs for performance characterization and management control

Proceedings of the 2010 ACM Symposium on Applied Computing
Data centric highly parallel debugging

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Tree-based overlay networks for scalable applications

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploiting Latent I/O Asynchrony in Petascale Science Applications

International Journal of High Performance Computing Applications
FINAL: flexible and scalable composition of file system name spaces

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Just in time: adding value to the IO pipelines of high performance applications with JITStaging

Proceedings of the 20th international symposium on High performance distributed computing
Order preserving event aggregation in TBONs

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Large scale debugging of parallel tasks with AutomaDeD

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Performance analysis of shared-memory parallel applications using performance properties

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Ygdrasil: aggregator network toolkit for large scale systems and the grid

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Poster: scalable infrastructure to support supercomputer resiliency-aware applications and load balancing

Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
Finding inefficiencies in OpenMP applications automatically with periscope

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
A lightweight library for building scalable tools

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
MATE: toward scalable automated and dynamic performance tuning environment

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Further improving the scalability of the scalasca toolset

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
A Scalable Parallel Debugging Library with Pluggable Communication Protocols

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Detection and analysis of resource usage anomalies in large distributed systems through multi-scale visualization

Concurrency and Computation: Practice & Experience
TA UoverSupermon: low-overhead online parallel performance monitoring

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Search strategies for automatic performance analysis tools

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
LIBI: A framework for bootstrapping extreme scale software systems

Parallel Computing
A gossip-based approach to exascale system services

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Runtime MPI collective checking with tree-based overlay networks

Proceedings of the 20th European MPI Users' Group Meeting
Distributed wait state tracking for runtime MPI deadlock detection

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using simulation to explore distributed key-value stores for extreme-scale system services

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Mr. Scan: extreme scale density-based clustering using a tree-based network of GPGPU nodes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present MRNet, a software-based multicast/reduction network for building scalable performance and system administration tools. MRNet supports multiple simultaneous, asynchronous collective communication operations. MRNet is flexible, allowing tool builders to tailor its process network topology to suit their tool's requirements and the underlying system's capabilities. MRNet is extensible, allowing tool builders to incorporate custom data reductions to augment its collection of built-in reductions. We evaluated MRNet in a simple test tool and also integrated into an existing, real-world performance tool with up to 512 tool back-ends. In the real-world tool, we used MRNet not only for multicast and simple data reductions but also with custom histogram and clock skew detection reductions. In our experiments, the MRNet-based tools showed significantly better performance than the tools without MRNet for average message latency and throughput, overall tool start-up latency, and performance data processing throughput.