Performance of optical flow techniques
International Journal of Computer Vision
Extracting task-level parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
Introspective sorting and selection algorithms
Software—Practice & Experience
Performance estimation of embedded software with instruction cache modeling
ACM Transactions on Design Automation of Electronic Systems (TODAES)
SystemC: a modeling platform supporting multiple design abstractions
Proceedings of the 14th international symposium on Systems synthesis
Synthesis of hardware models in C with pointers and complex data structures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - System Level Design
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
Adaptive Optimizing Compilers for the 21st Century
The Journal of Supercomputing
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
A framework for performance modeling and prediction
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Codesign-extended applications
Proceedings of the tenth international symposium on Hardware/software codesign
Multi-objective design space exploration using genetic algorithms
Proceedings of the tenth international symposium on Hardware/software codesign
A performance analysis of the Berkeley UPC compiler
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Runtime Assignment of Reconfigurable Hardware Components for Image Processing Pipelines
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A quantitative analysis of the speedup factors of FPGAs over processors
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Pace--A Toolset for the Performance Prediction of Parallel and Distributed Systems
International Journal of High Performance Computing Applications
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Content-based image retrieval: approaches and trends of the new age
Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
A code refinement methodology for performance-improved synthesis from C
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Examining the viability of FPGA supercomputing
EURASIP Journal on Embedded Systems
Performance estimation of distributed real-time embedded systems by discrete event simulations
EMSOFT '07 Proceedings of the 7th ACM & IEEE international conference on Embedded software
Algorithms for Reporting and Counting Geometric Intersections
IEEE Transactions on Computers
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
RAT: a methodology for predicting performance in application design migration to FPGAs
HPRCTA '07 Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07
PetaBricks: a language and compiler for algorithmic choice
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Optimized generation of memory structure in compiling window operations onto reconfigurable hardware
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
FPGA Circuit Synthesis of Accelerator Data-Parallel Programs
FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
Characterization of Fixed and Reconfigurable Multi-Core Devices for Application Acceleration
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A Simulation Framework for Rapid Analysis of Reconfigurable Computing Systems
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Novo-G: At the Forefront of Scalable Reconfigurable Supercomputing
Computing in Science and Engineering
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Concurrency and Computation: Practice & Experience - Euro-Par 2009
FUSE: Front-End User Framework for O/S Abstraction of Hardware Accelerators
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
Dynafuse: dynamic dependence analysis for FPGA pipeline fusion and locality optimizations
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 0.00 |
Due to power limitations and escalating cooling costs, high-performance computing systems can no longer rely solely on faster clock frequencies and numerous microprocessor nodes to meet increasing performance demands. As an alternative approach, high-performance systems are increasingly integrating multi-core processors and heterogeneous accelerators such as GPUs and FPGAs. However, usage of such hybrid systems has been limited largely to device experts due to significantly increased application design complexity. To enable more transparent usage of hybrid systems, we introduce elastic computing, which is an optimization framework where application designers invoke specialized elastic functions that contain a knowledge-base of implementation alternatives and parallelization strategies. For each elastic function, a collection of optimization tools analyze numerous possible implementations which enables dynamic and transparent optimization for different resources and run-time parameters. In this paper, we present the enabling technologies of elastic computing, and evaluate those technologies on four different hybrid systems, including the Novo-G FPGA supercomputer. The results include detailed case studies of using elastic computing for time-domain convolution and sum of absolute difference image retrieval, which achieved speedups up to 206x.