Dynamic percolation: a case of study on the shortcomings of traditional optimization in many-core architectures

Authors:
Elkin Garcia;Daniel Orozco;Rishi Khan;Ioannis E. Venetis;Kelly Livingston;Guang R. Gao
Affiliations:
University of Delaware, Newark, DE, USA;University of Delaware, Newark, DE, USA;ET International, Newark, DE, USA;University of Patras, Rion, Greece;University of Delaware, Newark, DE, USA;University of Delaware, Newark, DE, USA
Venue:
Proceedings of the 9th conference on Computing Frontiers
Year:
2012

Citing 5
Cited 2

Mapping the FDTD Application to Many-Core Chip Architectures

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Optimized dense matrix multiplication on a many-core architecture

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Toward high-throughput algorithms on many-core architectures

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
TIDeFlow: The Time Iterated Dependency Flow Execution Model

DFM '11 Proceedings of the 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing
A Discussion in Favor of Dynamic Scheduling for Regular Applications in Many-core Architectures

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Strategies for improving performance and energy efficiency on a many-core

Proceedings of the ACM International Conference on Computing Frontiers
An implementation of the codelet model

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper provides a discussion on the shortcomings of traditional static optimization techniques when used in the context of many-core architectures. We argue that these shortcomings are a result of the significantly different environment found in many-cores. We analyze previous attempts at optimization of Dense Matrix Multiplication (DMM) that failed to achieve high performance despite extensive efforts towards optimization. We have found that percolation (prefetching data) and scheduling play a central role in the performance of applications. To overcome those difficulties, we have (1) fused dynamic scheduling and percolation into a dynamic percolation approach and (2) we have added additional percolation operations. Our new techniques enabled us to increase the performance of the application in our study from 44 GFLOPS (out of 80 GFLOPS possible) to 70.0 GFLOPS (operands in SRAM) or 65.6 GFLOPS (operands in DRAM).