A Mapping Strategy for Parallel Processing
IEEE Transactions on Computers
Task allocation onto a hypercube by recursive mincut bipartitioning
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Embedding of Complete Binary Trees into Meshes with Row-Column Routing
IEEE Transactions on Parallel and Distributed Systems
Machine Learning
Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
IEEE Transactions on Computers
On Embedding Rectangular Grids in Square Grids
IEEE Transactions on Computers
Dynamic topology aware load balancing algorithms for molecular dynamics applications
Proceedings of the 23rd international conference on Supercomputing
Topology-aware task mapping for reducing communication contention on large parallel machines
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimizing communication for Charm++ applications by reducing network contention
Concurrency and Computation: Practice & Experience - Euro-Par 2009
Generic topology mapping strategies for large-scale parallel architectures
Proceedings of the international conference on Supercomputing
Automating topology aware mapping for supercomputers
Automating topology aware mapping for supercomputers
Scikit-learn: Machine Learning in Python
The Journal of Machine Learning Research
Mapping applications with collectives over sub-communicators on torus networks
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Task mapping on torus networks has traditionally focused on either reducing the maximum dilation or average number of hops per byte for messages in an application. These metrics make simplified assumptions about the cause of network congestion, and do not provide accurate correlation with execution time. Hence, these metrics cannot be used to reasonably predict or compare application performance for different mappings. In this paper, we attempt to model the performance of an application using communication data, such as the communication graph and network hardware counters. We use supervised learning algorithms, such as randomized decision trees, to correlate performance with prior and new metrics. We propose new hybrid metrics that provide high correlation with application performance, and may be useful for accurate performance prediction. For three different communication patterns and a production application, we demonstrate a very strong correlation between the proposed metrics and the execution time of these codes.