Clustering performance data efficiently at massive scales

Authors:
Todd Gamblin;Bronis R. de Supinski;Martin Schulz;Rob Fowler;Daniel A. Reed
Affiliations:
Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA;University of North Carolina, Chapel Hill, NC;Microsoft Research, Redmond, WA
Venue:
Proceedings of the 24th ACM International Conference on Supercomputing
Year:
2010

Citing 17
Cited 6

Ten lectures on wavelets

Ten lectures on wavelets
Parallel algorithms for hierarchical clustering

Parallel Computing
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Algorithm 806: SPRNG: a scalable library for pseudorandom number generation

ACM Transactions on Mathematical Software (TOMS)
Distributed data clustering can be efficient and exact

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Clustering on a Hypercube Multicomputer

IEEE Transactions on Parallel and Distributed Systems
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Monitoring Large Systems Via Statistical Sampling

International Journal of High Performance Computing Applications
Short communication: A novel parallelization approach for hierarchical clustering

Parallel Computing
Comparing clusterings: an axiomatic view

ICML '05 Proceedings of the 22nd international conference on Machine learning
PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
How slow is the k-means method?

Proceedings of the twenty-second annual symposium on Computational geometry
PNMPI tools: a whole lot greater than the sum of their parts

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable load-balance measurement for SPMD codes

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Discovering and Exploiting Program Phases

IEEE Micro
Parallel Hierarchical Clustering on Market Basket Data

ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
Major Computer Science Challenges At Exascale

International Journal of High Performance Computing Applications

Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable fine-grained call path tracing

Proceedings of the international conference on Supercomputing
Large scale debugging of parallel tasks with AutomaDeD

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Elastic and scalable tracing and accurate replay of non-deterministic events

Proceedings of the 27th international ACM conference on International conference on supercomputing
An early prototype of an autonomic performance environment for exascale

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Mr. Scan: extreme scale density-based clustering using a tree-based network of GPGPU nodes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing supercomputers have hundreds of thousands of processor cores, and future systems may have hundreds of millions. Developers need detailed performance measurements to tune their applications and to exploit these systems fully. However, extreme scales pose unique challenges for performance-tuning tools, which can generate significant volumes of I/O. Compute-to-I/O ratios have increased drastically as systems have grown, and the I/O systems of large machines can handle the peak load from only a small fraction of cores. Tool developers need efficient techniques to analyze and to reduce performance data from large numbers of cores. We introduce CAPEK, a novel parallel clustering algorithm that enables in-situ analysis of performance data at run time. Our algorithm scales sub-linearly to 131,072 processes, running in less than one second even at that scale, which is fast enough for on-line use in production runs. The CAPEK implementation is fully generic and can be used for many types of analysis. We demonstrate its application to statistical trace sampling. Specifically, we use our algorithm to compute efficiently stratified sampling strategies for traces at run time. We show that such stratification can result in data-volume reduction of up to four orders of magnitude on current large-scale systems, with potential for greater reductions for future extreme-scale systems.