SemCache: semantics-aware caching for efficient GPU offloading

Authors:
Nabeel AlSaber;Milind Kulkarni
Affiliations:
Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA
Venue:
Proceedings of the 27th international ACM conference on International conference on supercomputing
Year:
2013

Citing 17
Cited 0

Distributed Shared Memory: A Survey of Issues and Algorithms

Computer - Distributed computing systems: separate resources acting as one
TreadMarks: Shared Memory Computing on Networks of Workstations

Computer
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Merge: a programming model for heterogeneous multi-core systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Hybrid access-specific software cache techniques for the cell BE architecture

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Solving dense linear systems on platforms with multiple hardware accelerators

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An asymmetric distributed shared memory model for heterogeneous parallel systems

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Automatic CPU-GPU communication management and optimization

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
MDR: performance model driven runtime for heterogeneous parallel platforms

Proceedings of the international conference on Supercomputing
Automating GPU computing in MATLAB

Proceedings of the international conference on Supercomputing
Scalable GPU graph traversal

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Dynamically managed data for CPU-GPU architectures

Proceedings of the Tenth International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, GPU libraries have made it easy to improve application performance by offloading computation to the GPU. However, using such libraries introduces the complexity of manually handling explicit data movements between GPU and CPU memory spaces. Unfortunately, when using these libraries with complex applications, it is very difficult to optimize CPU-GPU communication between multiple kernel invocations to avoid redundant communication. In this paper, we introduce SemCache, a semantics-aware GPU cache that automatically manages CPU-GPU communication and dynamically optimizes communication by eliminating redundant transfers using caching. Its key feature is the use of library semantics to determine the appropriate caching granularity for a given offloaded library (e.g., matrices in BLAS). We applied SemCache to BLAS libraries to provide a GPU drop-in replacement library which handles communications and optimizations automatically. Our caching technique is efficient; it only tracks matrices instead of tracking every memory access at fine granularity. Experimental results show that our system can dramatically reduce redundant communication for real-world computational science application and deliver significant performance improvements, beating GPU-based implementations like CULA and CUBLAS.