Static scheduling of synchronous data flow programs for digital signal processing
IEEE Transactions on Computers
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A linear-time heuristic for improving network partitions
DAC '82 Proceedings of the 19th Design Automation Conference
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Software Pipelined Execution of Stream Programs on GPUs
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Exploring the multiple-GPU design space
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A computing origami: folding streams in FPGAs
Proceedings of the 46th Annual Design Automation Conference
An adaptive performance modeling tool for GPU architectures
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
IEEE Micro
Sponge: portable stream programming on graphics engines
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A quantitative performance analysis model for GPU architectures
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Multi-GPU MapReduce on GPU Clusters
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Automated Architecture-Aware Mapping of Streaming Applications Onto GPUs
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Scaling large-data computations on multi-GPU accelerators
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Graphics processing units leverage on a large array of parallel processing cores to boost the performance of a specific streaming computation pattern frequently found in graphics applications. Unfortunately, while many other general purpose applications do exhibit the required streaming behavior, they also possess unfavorable data layout and poor computation-to-communication ratios that penalize any straight-forward execution on the GPU. In this paper we describe an efficient and scalable code generation framework that can map general purpose streaming applications onto a multi-GPU system. This framework spans the entire core and memory hierarchy exposed by the multi-GPU system. Several key features in our framework ensure the scalability required by complex streaming applications. First, we propose an efficient stream graph partitioning algorithm that partitions the complex application to achieve the best performance under a given shared memory constraint. Next, the resulting partitions are mapped to multiple GPUs using an efficient architecture-driven strategy. The mapping balances the workload while considering the communication overhead. Finally, a highly effective pipeline execution is employed for the execution of the partitions on the multi-GPU system. The framework has been implemented as a back-end of the StreamIt programming language compiler. Our comprehensive experiments show its scalability and significant performance speedup compared with a previous state-of-the-art solution.