Compiler optimization-space exploration
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Accurate and efficient regression modeling for microarchitectural performance and power prediction
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Program optimization space pruning for a multithreaded gpu
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Proceedings of the 36th annual international symposium on Computer architecture
A Note on Auto-tuning GEMM for GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Model-driven autotuning of sparse matrix-vector multiply on GPUs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
An integrated GPU power and performance model
Proceedings of the 37th annual international symposium on Computer architecture
A case for machine learning to optimize multicore performance
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Statistical power modeling of GPU kernels using performance counters
GREENCOMP '10 Proceedings of the International Conference on Green Computing
Auto-tuning of fast fourier transform on graphics processors
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Tree structured analysis on GPU power study
ICCD '11 Proceedings of the 2011 IEEE 29th International Conference on Computer Design
A performance analysis framework for identifying potential benefits in GPGPU applications
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Stargazer: Automated regression-based GPU design space exploration
ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
Hi-index | 0.00 |
Graphics processing units (GPUs) are in increasingly wide use, but significant hurdles lie in selecting the appropriate algorithms, runtime parameter settings, and hardware configurations to achieve power and performance goals with them. Exploring hardware and software choices requires time-consuming simulations or extensive real-system measurements. While some auto-tuning support has been proposed, it is often narrow in scope and heuristic in operation. This paper proposes and evaluates a statistical analysis technique, Starchart, that partitions the GPU hardware/software tuning space by automatically discerning important inflection points in design parameter values. Unlike prior methods, Starchart can identify the best parameter choices within different regions of the space. Our tool is efficient--evaluating at most 0.3% of the tuning space, and often much less--and is robust enough to analyze highly variable real-system measurements, not just simulation. In one case study, we use it to automatically find platform-specific parameter settings that are 6.3X faster (for AMD) and 1.3X faster (for NVIDIA) than a single general setting. We also show how power-optimized parameter settings can save 47 W (26% of total GPU power) with little performance loss. Overall, Starchart can serve as a foundation for a range of GPU compiler optimizations, auto-tuners, and programmer tools. Furthermore, because Starchart does not rely on specific GPU features, we expect it to be useful for broader CPU/GPU studies as well.