Analysis of benchmark characteristics and benchmark performance prediction
ACM Transactions on Computer Systems (TOCS)
LogP: a practical model of parallel computation
Communications of the ACM
Semi-empirical multiprocessor performance predictions
Journal of Parallel and Distributed Computing
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
LogGP: incorporating long messages into the LogP model for parallel computation
Journal of Parallel and Distributed Computing
The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms
SIAM Journal on Computing
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes
IEEE Transactions on Computers
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
CISIS '07 Proceedings of the First International Conference on Complex, Intelligent and Software Intensive Systems
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
High-Performance Heterogeneous Computing with the Convey HC-1
Computing in Science and Engineering
ACM SIGARCH Computer Architecture News
An exploration of performance attributes for symbolic modeling of emerging processing devices
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.00 |
Hybrid-core systems speedup applications by offloading certain compute operations that can run faster on hardware accelerators. However, such systems require significant programming and porting effort to gain a performance benefit from the accelerators. Therefore, prior to porting it is prudent to investigate the predicted performance benefit of accelerators for a given workload. To address this problem we present a performance-modeling framework that predicts the application performance rapidly and accurately for hybrid-core systems. We present predictions for two full-scale HPC applications-HYCOM and Milc. Our results for two accelerators (GPU and FPGA) show that gather/scatter and stream operations can speedup by as much as a factor of 15 and overall compute time of Milc and HYCOM improve by 3.4% and 20%, respectively. We also show that in order to benefit from the accelerators, 70% of the latency of data transfer time between the CPU and the accelerators needs to be overcome.