Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A view of the parallel computing landscape
Communications of the ACM - A View of Parallel Computing
Network-theoretic classification of parallel computation patterns
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
We aim here to leverage supervised learning to enable large-scale analysis of performance logs, in order to accurately classify code runs and understand the importance of different performance metrics. Previous work has demonstrated structured communication patterns in high performance codes. By categorizing these patterns, we can identify what code was executed. The ability to identify a code by its performance profile is useful for specializing HPC security systems and for identifying common optimizations for similar codes. Supervised machine learning is used on an extensive set of data of real user runs from a high performance computing center. We employ and modify a rule ensemble method to predict what code was run given a performance log. This naive method achieves greater than 93% accuracy. When modified to allow an "other class," accuracy increases to greater than 97%. This modification allows an anomalous run to be flagged as not belonging to a previously seen, or acceptable, code and provides additional latitude in monitoring what is run on supercomputing facilities. We conclude by interpreting the resulting rule model, as it tells us which components of a code are most distinctive and useful for identification.