Mapping the FDTD Application to Many-Core Chip Architectures
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Optimized dense matrix multiplication on a many-core architecture
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Toward high-throughput algorithms on many-core architectures
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
TIDeFlow: The Time Iterated Dependency Flow Execution Model
DFM '11 Proceedings of the 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing
A Discussion in Favor of Dynamic Scheduling for Regular Applications in Many-core Architectures
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Strategies for improving performance and energy efficiency on a many-core
Proceedings of the ACM International Conference on Computing Frontiers
An implementation of the codelet model
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
This paper provides a discussion on the shortcomings of traditional static optimization techniques when used in the context of many-core architectures. We argue that these shortcomings are a result of the significantly different environment found in many-cores. We analyze previous attempts at optimization of Dense Matrix Multiplication (DMM) that failed to achieve high performance despite extensive efforts towards optimization. We have found that percolation (prefetching data) and scheduling play a central role in the performance of applications. To overcome those difficulties, we have (1) fused dynamic scheduling and percolation into a dynamic percolation approach and (2) we have added additional percolation operations. Our new techniques enabled us to increase the performance of the application in our study from 44 GFLOPS (out of 80 GFLOPS possible) to 70.0 GFLOPS (operands in SRAM) or 65.6 GFLOPS (operands in DRAM).