Proceedings of the tenth annual conference on Object-oriented programming systems, languages, and applications
An annotation language for optimizing software libraries
Proceedings of the 2nd conference on Domain-specific languages
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Predicate Dispatching: A Unified Theory of Dispatch
ECCOP '98 Proceedings of the 12th European Conference on Object-Oriented Programming
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Cg: a system for programming graphics hardware in a C-like language
ACM SIGGRAPH 2003 Papers
Programmable Stream Processors
Computer
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Metaprogramming GPUs with Sh
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Best of Both Latency and Throughput
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Mitigating Amdahl's Law through EPI Throttling
Proceedings of the 32nd annual international symposium on Computer Architecture
Multiple Instruction Stream Processor
Proceedings of the 33rd annual international symposium on Computer Architecture
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Performance evaluation of GPUs using the RapidMind development platform
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Harmony: an execution model and runtime for heterogeneous many core systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
GViM: GPU-accelerated virtual machines
Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
Supporting MapReduce on large-scale asymmetric multi-core clusters
ACM SIGOPS Operating Systems Review
A Unified Runtime System for Heterogeneous Multi-core Architectures
Euro-Par 2008 Workshops - Parallel Processing
The canals language and its compiler
Proceedings of th 12th International Workshop on Software and Compilers for Embedded Systems
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Proceedings of the 36th annual international symposium on Computer architecture
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
An Embrace-and-Extend Approach to Managing the Complexity of Future Heterogeneous Systems
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming
APLAS '09 Proceedings of the 7th Asian Symposium on Programming Languages and Systems
Accelerating SQL database operations on a GPU with CUDA
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
High-throughput Bayesian network learning using heterogeneous multicore computers
Proceedings of the 24th ACM International Conference on Supercomputing
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
An integrated GPU power and performance model
Proceedings of the 37th annual international symposium on Computer architecture
Designing Accelerator-Based Distributed Systems for High Performance
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A Capabilities-Aware Programming Model for Asymmetric High-End Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Run-time optimizations for replicated dataflows on heterogeneous environments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MapCG: writing parallel program portable between CPU and GPU
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Embracing heterogeneity: parallel programming for changing hardware
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
A capabilities-aware framework for using computational accelerators in data-intensive computing
Journal of Parallel and Distributed Computing
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A domain-specific approach to heterogeneous parallelism
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Cost-aware function migration in heterogeneous systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
A static task partitioning approach for heterogeneous systems using OpenCL
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
MDR: performance model driven runtime for heterogeneous parallel platforms
Proceedings of the international conference on Supercomputing
Dymaxion: optimizing memory access patterns for heterogeneous systems
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
The accelerator store: A shared memory framework for accelerator-based systems
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Reflex: using low-power processors in smartphones without knowing them
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Tarazu: optimizing MapReduce on heterogeneous clusters
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Parallelization of Belief Propagation on Cell Processors for Stereo Vision
ACM Transactions on Embedded Computing Systems (TECS)
Exploiting parallelism in deterministic shared memory multiprocessing
Journal of Parallel and Distributed Computing
Improving performance of adaptive component-based dataflow middleware
Parallel Computing
A survey on hardware-aware and heterogeneous computing on multicore processors and accelerators
Concurrency and Computation: Practice & Experience
A virtual memory based runtime to support multi-tenancy in clusters with GPUs
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
A Map-Reduce Based Framework for Heterogeneous Processing Element Cluster Environments
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Optimizing linpack benchmark on GPU-accelerated petascale supercomputer
Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
VForce: An environment for portable applications on high performance systems with accelerators
Journal of Parallel and Distributed Computing
Optimizing dataflow applications on heterogeneous environments
Cluster Computing
Operating systems should manage accelerators
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
A compiler-assisted runtime-prefetching scheme for heterogeneous platforms
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Workload and power budget partitioning for single-chip heterogeneous processors
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
ACM Transactions on Architecture and Code Optimization (TACO)
Prius: a runtime for hybrid computing
Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores
An automatic input-sensitive approach for heterogeneous task partitioning
Proceedings of the 27th international ACM conference on International conference on supercomputing
SemCache: semantics-aware caching for efficient GPU offloading
Proceedings of the 27th international ACM conference on International conference on supercomputing
Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms
Proceedings of the ACM International Conference on Computing Frontiers
Load balancing in a changing world: dealing with heterogeneity and performance variability
Proceedings of the ACM International Conference on Computing Frontiers
MapReduce with communication overlap (MaRCO)
Journal of Parallel and Distributed Computing
Temperature aware thread block scheduling in GPGPUs
Proceedings of the 50th Annual Design Automation Conference
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
RSVM: a region-based software virtual memory for GPU
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Efficient implementation of data flow graphs on multi-gpu clusters
Journal of Real-Time Image Processing
Boosting CUDA Applications with CPU---GPU Hybrid Computing
International Journal of Parallel Programming
Hi-index | 0.00 |
In this paper we propose the Merge framework, a general purpose programming model for heterogeneous multi-core systems. The Merge framework replaces current ad hoc approaches to parallel programming on heterogeneous platforms with a rigorous, library-based methodology that can automatically distribute computation across heterogeneous cores to achieve increased energy and performance efficiency. The Merge framework provides (1) a predicate dispatch-based library system for managing and invoking function variants for multiple architectures; (2) a high-level, library-oriented parallel language based on map-reduce; and (3) a compiler and runtime which implement the map-reduce language pattern by dynamically selecting the best available function implementations for a given input and machine configuration. Using a generic sequencer architecture interface for heterogeneous accelerators, the Merge framework can integrate function variants for specialized accelerators, offering the potential for to-the-metal performance for a wide range of heterogeneous architectures, all transparent to the user. The Merge framework has been prototyped on a heterogeneous platform consisting of an Intel Core 2 Duo CPU and an 8-core 32-thread Intel Graphics and Media Accelerator X3000, and a homogeneous 32-way Unisys SMP system with Intel Xeon processors. We implemented a set of benchmarks using the Merge framework and enhanced the library with X3000 specific implementations, achieving speedups of 3.6x -- 8.5x using the X3000 and 5.2x -- 22x using the 32-way system relative to the straight C reference implementation on a single IA32 core.