An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Hiding communication latency and coherence overhead in software DSMs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Data prefetching for software DSMs
ICS '98 Proceedings of the 12th international conference on Supercomputing
Heterogeneous Distributed Shared Memory
IEEE Transactions on Parallel and Distributed Systems
Practical, transparent operating system support for superpages
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
CAS-DSM: a compiler assisted software distributed shared memory
International Journal of Parallel Programming
GPGPU: general purpose computation on graphics hardware
ACM SIGGRAPH 2004 Course Notes
X10: concurrent programming for modern architectures
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Programming model for a heterogeneous x86 platform
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
RSVM: a region-based software virtual memory for GPU
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
The client computing platform is moving towards a heterogeneous architecture that combines scalar-oriented CPU cores and throughput-oriented accelerator cores. Recognizing that existing programming models for such heterogeneous platforms are still difficult for most programmers, we advocate a shared virtual memory programming model to improve programmability. In this paper, we focus on performance, and demonstrate that users need not sacrifice performance for programmability. We describe our approaches, experiences, and results in optimizing MYO on a heterogeneous platform consisting of a CPU and an Aubrey Isle accelerator. Our efforts involve the whole system software stack including the OS, runtime, and application.