Optimal broadcast and summation in the LogP model
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Adaptive parallel aggregation algorithms
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
LogP: a practical model of parallel computation
Communications of the ACM
LogGP: incorporating long messages into the LogP model for parallel computation
Journal of Parallel and Distributed Computing
Modeling and Evaluating Design Alternatives for an On-Line Instrumentation System: A Case Study
IEEE Transactions on Software Engineering
MagPIe: MPI's collective communication operations for clustered wide area systems
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Semicoarsening Multigrid on Distributed Memory Machines
SIAM Journal on Scientific Computing
Automatically tuned collective communications
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
A New Approach to Parallel Debugger Architecture
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Supermon: A High-Speed Cluster Monitoring System
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Lilith: Scalable Execution of User Code for Distributed Computing
HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
MDL: A Language And Compiler For Dynamic Program Instrumentation
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A taxonomy of grid monitoring systems
Future Generation Computer Systems
On-line automated performance diagnosis on thousands of processes
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable dynamic binary instrumentation for Blue Gene/L
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Scalable, fault tolerant membership for MPI tasks on HPC systems
Proceedings of the 20th annual international conference on Supercomputing
Dynamic binary instrumentation and data aggregation on large scale systems
International Journal of Parallel Programming
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Characterizing the I/O behavior of scientific applications on the Cray XT
PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Preserving time in large-scale communication traces
Proceedings of the 22nd annual international conference on Supercomputing
Open | SpeedShop: An open source infrastructure for parallel performance analysis
Scientific Programming - Large-Scale Programming Tools and Environments
Lessons learned at 208K: towards debugging millions of cores
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Scalable load-balance measurement for SPMD codes
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Automatic Memory Access Analysis with Periscope
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Observing Performance Dynamics Using Parallel Profile Snapshots
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing
Journal of Parallel and Distributed Computing
Scalable temporal order analysis for large scale debugging
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A taxonomy of grid monitoring systems
Future Generation Computer Systems
Scalable dynamic Monitoring, Analysis and Tuning Environment for parallel applications
Journal of Parallel and Distributed Computing
Monitoring MPI programs for performance characterization and management control
Proceedings of the 2010 ACM Symposium on Applied Computing
Data centric highly parallel debugging
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Tree-based overlay networks for scalable applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploiting Latent I/O Asynchrony in Petascale Science Applications
International Journal of High Performance Computing Applications
FINAL: flexible and scalable composition of file system name spaces
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Just in time: adding value to the IO pipelines of high performance applications with JITStaging
Proceedings of the 20th international symposium on High performance distributed computing
Order preserving event aggregation in TBONs
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Large scale debugging of parallel tasks with AutomaDeD
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Performance analysis of shared-memory parallel applications using performance properties
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Ygdrasil: aggregator network toolkit for large scale systems and the grid
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
Finding inefficiencies in OpenMP applications automatically with periscope
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
A lightweight library for building scalable tools
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
MATE: toward scalable automated and dynamic performance tuning environment
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Further improving the scalability of the scalasca toolset
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
A Scalable Parallel Debugging Library with Pluggable Communication Protocols
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Concurrency and Computation: Practice & Experience
TA UoverSupermon: low-overhead online parallel performance monitoring
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Search strategies for automatic performance analysis tools
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
LIBI: A framework for bootstrapping extreme scale software systems
Parallel Computing
A gossip-based approach to exascale system services
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Runtime MPI collective checking with tree-based overlay networks
Proceedings of the 20th European MPI Users' Group Meeting
Distributed wait state tracking for runtime MPI deadlock detection
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using simulation to explore distributed key-value stores for extreme-scale system services
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Mr. Scan: extreme scale density-based clustering using a tree-based network of GPGPU nodes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
We present MRNet, a software-based multicast/reduction network for building scalable performance and system administration tools. MRNet supports multiple simultaneous, asynchronous collective communication operations. MRNet is flexible, allowing tool builders to tailor its process network topology to suit their tool's requirements and the underlying system's capabilities. MRNet is extensible, allowing tool builders to incorporate custom data reductions to augment its collection of built-in reductions. We evaluated MRNet in a simple test tool and also integrated into an existing, real-world performance tool with up to 512 tool back-ends. In the real-world tool, we used MRNet not only for multicast and simple data reductions but also with custom histogram and clock skew detection reductions. In our experiments, the MRNet-based tools showed significantly better performance than the tools without MRNet for average message latency and throughput, overall tool start-up latency, and performance data processing throughput.