A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
A programming system for the imagine media processor
A programming system for the imagine media processor
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Straight Method for Reallocation of Complex Cores by Dynamic Reconfiguration in Virtex II FPGAs
RSP '05 Proceedings of the 16th IEEE International Workshop on Rapid System Prototyping
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Programming for parallelism and locality with hierarchically tiled arrays
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Design and applications of a reconfigurable computing system for high performance digital signal processing
The Erlangen Slot Machine: A Dynamically Reconfigurable FPGA-based Computer
Journal of VLSI Signal Processing Systems
Proceedings of the conference on Design, automation and test in Europe
Designing BEE: a hardware emulation engine for signal processing in low-power wireless applications
EURASIP Journal on Applied Signal Processing
A high-end real-time digital film processing reconfigurable platform
EURASIP Journal on Embedded Systems
Scalable FPGA-based architecture for DCT computation using dynamic partial reconfiguration
ACM Transactions on Embedded Computing Systems (TECS)
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Recent research has shown that field programmable gate arrays (FPGAs) have a large potential for accelerating demanding applications, such as high performance digital signal process applications with low-volume market. The loss of generality in the architecture is one disadvantage of using FPGAs, however, the reconfigurability of FPGAs allow reprogramming for other applications. Therefore, a uniform FPGA-based architecture, an efficient programming model, and a simple mapping method are paramount for the wide acceptance of FPGA technology. This paper presents MASALA, a dynamically reconfigurable FPGA-based accelerator for parallel programs written in thread-intensive and explicit memory management (TEMM) programming models. Our system uses a TEMM programming model to parallelize demanding applications, including application decomposition into separate thread blocks and compute and data load/store decoupling. Hardware engines are included into MASALA using partial dynamic reconfiguration modules, each of which encapsulates a thread process engine that implements the hardware's thread functionality. A data dispatching scheme is also included in MASALA to enable the explicit communication of multiple memory hierarchies such as interhardware engines, host processors, and hardware engines. Finally, this paper illustrates a multi-FPGA prototype system of the presented architecture: MASALA-SX. A large synthetic aperture radar image formatting experiment shows that MASALA's architecture facilitates the construction of a TEMM program accelerator by providing greater performance and less power consumption than current CPU platforms, without sacrificing programmability, flexibility, and scalability.