Hogs and slackers: Using operations balance in a genetic algorithm to optimize sparse algebra computation on distributed architectures

Authors:
Una-May O'Reilly;Eric Robinson;Sanjeev Mohindra;Julie Mullen;Nadya Bliss
Affiliations:
Computer Science and Artificial Intelligence Laboratory, Massachuestts Institute of Technology, Cambridge, USA;Lincoln Laboratory, Massachuestts Institute of Technology, Cambridge, USA;Lincoln Laboratory, Massachuestts Institute of Technology, Cambridge, USA;Lincoln Laboratory, Massachuestts Institute of Technology, Cambridge, USA;Lincoln Laboratory, Massachuestts Institute of Technology, Cambridge, USA
Venue:
Parallel Computing
Year:
2010

Citing 19
Cited 0

Matrix analysis

Matrix analysis
Maximum matchings in general graphs through randomization

Journal of Algorithms
A Unified Approach to Path Problems

Journal of the ACM (JACM)
Static scheduling algorithms for allocating directed task graphs to multiprocessors

ACM Computing Surveys (CSUR)
PSBLAS: a library for parallel linear algebra computation on sparse matrices

ACM Transactions on Mathematical Software (TOMS)
Algorithm 97: Shortest path

Communications of the ACM
Automatic generation of efficient array redistribution routines for distributed memory multicomputers

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Detecting short directed cycles using rectangular matrix multiplication and dynamic programming

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
pMapper: Automatic Mapping of Parallel Matlab Programs

DOD_UGC '05 Proceedings of the 2005 Users Group Conference on 2005 Users Group Conference
R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks

Algorithmica
Single vehicle pickup and delivery with time windows: made to measure genetic encoding and operators

Proceedings of the 9th annual conference companion on Genetic and evolutionary computation
pMatlab Parallel Matlab Library

International Journal of High Performance Computing Applications
Technical Challenges of Supporting Interactive HPC

HPCMP-UGC '07 Proceedings of the 2007 DoD High Performance Computing Modernization Program Users Group Conference
Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems

Proceedings of the 22nd annual international conference on Supercomputing
A Unified Framework for Numerical and Combinatorial Computing

Computing in Science and Engineering
Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Performance Modeling and Mapping of Sparse Computations

HPCMP-UGC '08 Proceedings of the 2008 DoD HPCMP Users Group Conference
Next Generation Sequence Analysis Using Genetic Algorithms on Multi-core Technology

IJCBS '09 Proceedings of the 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a framework for optimizing the distributed performance of sparse matrix computations. These computations are optimally parallelized by distributing their operations across processors in a subtly uneven balance. Because the optimal balance point depends on the non-zero patterns in the data, the algorithm, and the underlying hardware architecture, it is difficult to determine. The Hogs and Slackers genetic algorithm (GA) identifies processors with many operations -hogs, and processors with few operations -slackers. Its intelligent operation-balancing mutation operator swaps data blocks between hogs and slackers to explore new balance points. We show that this operator is integral to the performance of the genetic algorithm and use the framework to conduct an architecture study that varies network specifications. The Hogs and Slackers GA is itself a parallel algorithm with near linear speedup on a large computing cluster.