Thread warping: a framework for dynamic synthesis of thread accelerators

Authors:
Greg Stitt;Frank Vahid
Affiliations:
University of Florida, Gainesville, FL;University of California: Riverside, Riverside, CA
Venue:
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Year:
2007

Citing 17
Cited 2

Wisconsin Architectural Research Tool Set

ACM SIGARCH Computer Architecture News
Plasma: an FPGA for million gate systems

Proceedings of the 1996 ACM fourth international symposium on Field-programmable gate arrays
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
SystemC: a homogenous environment to test embedded systems

Proceedings of the ninth international symposium on Hardware/software codesign
Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits
Hardware-software bipartitioning for dynamically reconfigurable systems

Proceedings of the tenth international symposium on Hardware/software codesign
SPARK: A High-Lev l Synthesis Framework For Applying Parallelizing Compiler Transformations

VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Extending the SystemC synthesis subset by object-oriented features

Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Programming Models for Hybrid CPU/FPGA Chips

Computer
Automatic translation of software binaries onto FPGAs

Proceedings of the 41st annual Design Automation Conference
Input data reuse in compiling window operations onto reconfigurable hardware

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Hardware synthesis from coarse-grained dataflow specification for fast HW/SW cosynthesis

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation

FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
An Event-Driven Multithreaded Dynamic Optimization Framework

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
New decompilation techniques for binary-level co-processor generation

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Warp Processors

Proceedings of the 41st annual Design Automation Conference
Efficient hardware checkpointing: concepts, overhead analysis, and implementation

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays

Recursion flattening

Proceedings of the 18th ACM Great Lakes symposium on VLSI
Boosting parallel applications performance on applying DIM technique in a multiprocessing environment

International Journal of Reconfigurable Computing - Special issue on selected papers from the 17th reconfigurable architectures workshop (RAW2010)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a dynamic optimization technique, thread warping, that uses a single processor on a multiprocessor system to dynamically synthesize threads into custom accelerator circuits on FPGAs (field-programmable gate arrays). Building on dynamic synthesis for single-processor single-thread systems, known as warp processing, thread warping improves performances of multiprocessor systems by speeding up individual threads and by allowing more threads to execute concurrently. Furthermore, thread warping maintains the important separation of function from architecture, enabling portability of applications to architectures with different quantities of microprocessors and FPGA.an advantage not shared by static compilation/synthesis approaches. We introduce a framework of architecture, CAD tools, and operating system that together support thread warping. We summarize experiments on an extensive architectural simulation framework we developed, showing application speedups of 4x to 502x, averaging 130x compared to a multiprocessor system having four ARM11 microprocessors, for eight benchmark applications. Even compared to a 64-processor system, thread warping achieves 11x speedup.