The CODE 2.0 graphical parallel programming language
ICS '92 Proceedings of the 6th international conference on Supercomputing
A case for intelligent disks (IDISKs)
ACM SIGMOD Record
ACM Transactions on Computer Systems (TOCS)
Programming Microsoft Directshow
Programming Microsoft Directshow
P-RIO: A Modular Parallel-Programming Environment
IEEE Concurrency
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Queue - Open Source
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Journal of Parallel and Distributed Computing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Programming using RapidMind on the Cell BE
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Fast computation of database operations using graphics processors
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
FFPF: fairly fast packet filters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Computer
Ruler: high-speed packet matching and rewriting on NPUs
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Tapping into the fountain of CPUs: on operating system support for programmable devices
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Accelerating computing with the cell broadband engine processor
Proceedings of the 5th conference on Computing frontiers
Relational joins on graphics processors
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
PipesFS: fast Linux I/O in the unix tradition
ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
Liquid Metal: Object-Oriented Programming Across the Hardware/Software Boundary
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
CUDA-Lite: Reducing GPU Programming Complexity
Languages and Compilers for Parallel Computing
Predictive Runtime Code Scheduling for Heterogeneous Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
hiCUDA: a high-level directive-based language for GPU programming
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Relational query coprocessing on graphics processors
ACM Transactions on Database Systems (TODS)
Distributed aggregation for data-parallel computing: interfaces and implementations
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Lime: a Java-compatible and synthesizable language for heterogeneous architectures
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Maestro: data orchestration and tuning for OpenCL devices
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
Data-Aware Task Scheduling on Multi-accelerator Based Platforms
ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
Copperhead: compiling an embedded data parallel language
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Application-Tailored I/O with Streamline
ACM Transactions on Computer Systems (TOCS)
CIEL: a universal execution engine for distributed data-flow computing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
A static task partitioning approach for heterogeneous systems using OpenCL
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Processing data streams with hard real-time constraints on heterogeneous systems
Proceedings of the international conference on Supercomputing
TimeGraph: GPU scheduling for real-time multi-tasking environments
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Pegasus: coordinated scheduling for virtualized accelerator-based systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Productive cluster programming with OmpSs
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
PTask: operating system abstractions to manage GPUs as compute devices
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Liszt: a domain specific language for building portable mesh-based PDE solvers
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
FPL-3: towards language support for distributed packet processing
NETWORKING'05 Proceedings of the 4th IFIP-TC6 international conference on Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communication Systems
FPL-3E: towards language support for reconfigurable packet processing
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Enabling task-level scheduling on heterogeneous platforms
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Fast PageRank Computation on a GPU Cluster
PDP '12 Proceedings of the 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Scientific and Engineering Computing Using ATI Stream Technology
Computing in Science and Engineering
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Spinning fast iterative data flows
Proceedings of the VLDB Endowment
REX: recursive, delta-based data-centric computation
Proceedings of the VLDB Endowment
Parallel PageRank computation using GPUs
Proceedings of the Third Symposium on Information and Communication Technology
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Rootbeer: Seamlessly Using GPUs from Java
HPCC '12 Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems
Legion: expressing locality and independence with logical regions
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Accelerating text mining workloads in a MapReduce-based distributed GPU environment
Journal of Parallel and Distributed Computing
GPUfs: integrating a file system with GPUs
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
LINQits: big data on little clients
Proceedings of the 40th Annual International Symposium on Computer Architecture
Naiad: a timely dataflow system
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Hi-index | 0.00 |
Computer systems increasingly rely on heterogeneity to achieve greater performance, scalability and energy efficiency. Because heterogeneous systems typically comprise multiple execution contexts with different programming abstractions and runtimes, programming them remains extremely challenging. Dandelion is a system designed to address this programmability challenge for data-parallel applications. Dandelion provides a unified programming model for heterogeneous systems that span diverse execution contexts including CPUs, GPUs, FPGAs, and the cloud. It adopts the .NET LINQ (Language INtegrated Query) approach, integrating data-parallel operators into general purpose programming languages such as C# and F#. It therefore provides an expressive data model and native language integration for user-defined functions, enabling programmers to write applications using standard high-level languages and development tools. Dandelion automatically and transparently distributes data-parallel portions of a program to available computing resources, including compute clusters for distributed execution and CPU and GPU cores of individual nodes for parallel execution. To enable automatic execution of .NET code on GPUs, Dandelion cross-compiles .NET code to CUDA kernels and uses the PTask runtime [85] to manage GPU execution. This paper discusses the design and implementation of Dandelion, focusing on the distributed CPU and GPU implementation. We evaluate the system using a diverse set of workloads.