Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
Dynamic warp subdivision for integrated branch and memory divergence tolerance
Proceedings of the 37th annual international symposium on Computer architecture
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
GPUs and the Future of Parallel Computing
IEEE Micro
Improving GPU performance via large warps and two-level warp scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
ACM Transactions on Computer Systems (TOCS)
Dynamic warp resizing: Analysis and benefits in high-performance SIMT
ICCD '12 Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)
Proceedings of International Workshop on Adaptive Self-tuning Computing Systems
Hi-index | 0.00 |
As graphics processing units (GPUs) are becoming increasingly popular for general purpose workloads (GPGPU), the question arises how such processors will evolve architecturally in the near future. In this work, we identify and discuss tradeoffs for three GPU architecture parameters: active thread count, compute-memory ratio, and cluster and warp sizing. For each parameter, we propose changes to improve GPU design, keeping in mind trends such as dark silicon and the increasing popularity of GPGPU architectures. A key-enabler is dynamism and workload-adaptiveness, enabling among others: dynamic register file sizing, latency aware scheduling, roofline-aware DVFS, runtime cluster fusion, and dynamic warp sizing.