Memory coherence in shared virtual memory systems
PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
Software caching and computation migration in Olden
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A scalable method for run-time loop parallelization
International Journal of Parallel Programming
Hoard: a scalable memory allocator for multithreaded applications
ACM SIGPLAN Notices
Run-time and compile-time support for adaptive irregular problems
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
ClawHMMER: A Streaming HMMer-Search Implementatio
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Optimizing irregular shared-memory applications for distributed-memory systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Making context-sensitive points-to analysis with heap cloning practical for the real world
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Optimizing irregular shared-memory applications for clusters
Proceedings of the 22nd annual international conference on Supercomputing
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Semi-sparse flow-sensitive pointer analysis
Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
OpenMP to GPGPU: a compiler framework for automatic translation and optimization
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
hiCUDA: a high-level directive-based language for GPU programming
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Points-to analysis with efficient strong updates
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic CPU-GPU communication management and optimization
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Automatic C-to-CUDA code generation for affine programs
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Atomic-free irregular computations on GPUs
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
SemCache: semantics-aware caching for efficient GPU offloading
Proceedings of the 27th international ACM conference on International conference on supercomputing
Effective dynamic detection of alias analysis errors
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Design and implementation of the fusion simulator based on multi-shader GPU
Proceedings of the 2013 Research in Adaptive and Convergent Systems
RSVM: a region-based software virtual memory for GPU
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Generating efficient data movement code for heterogeneous architectures with distributed-memory
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Journal of Parallel and Distributed Computing
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Automatic data allocation and buffer management for multi-GPU machines
ACM Transactions on Architecture and Code Optimization (TACO)
Portable and Transparent Host-Device Communication Optimization for GPGPU Environments
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
GPUs are flexible parallel processors capable of accelerating real applications. To exploit them, programmers must ensure a consistent program state between the CPU and GPU memories by managing data. Manually managing data is tedious and error-prone. In prior work on automatic CPU-GPU data management, alias analysis quality limits performance, and type-inference quality limits applicability. This paper presents Dynamically Managed Data (DyManD), the first automatic system to manage complex and recursive data-structures without static analyses. By replacing static analyses with a dynamic run-time system, DyManD overcomes the performance limitations of alias analysis and enables management for complex and recursive data-structures. DyManD-enabled GPU parallelization matches the performance of prior work equipped with perfectly precise alias analysis for 27 programs and demonstrates improved applicability on programs not previously managed automatically.