A binary-constraint search algorithm for minimizing hardware during hardware/software partitioning
EURO-DAC '94 Proceedings of the conference on European design automation
A HW/SW partitioning algorithm for dynamically reconfigurable architectures
Proceedings of the conference on Design, automation and test in Europe
Hardware/software partitioning with integrated hardware design space exploration
Proceedings of the conference on Design, automation and test in Europe
Dynamic hardware plugins in an FPGA with partial run-time reconfiguration
Proceedings of the 39th annual Design Automation Conference
Introduction to Algorithms
RaPiD - Reconfigurable Pipelined Datapath
FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
PACE: A Dynamic Programming Algorithm for Hardware/Software Partitioning
CODES '96 Proceedings of the 4th International Workshop on Hardware/Software Co-Design
The Chimaera reconfigurable functional unit
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Proceedings of the conference on Design, automation and test in Europe
An efficient algorithm for finding empty space for online FPGA placement
Proceedings of the 41st annual Design Automation Conference
Efficient search space exploration for HW-SW partitioning
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
ISVLSI '06 Proceedings of the IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Reduction strategies and exact algorithms for the disjunctively constrained knapsack problem
Computers and Operations Research
An efficient algorithm for online management of 2D area of partially reconfigurable FPGAs
Proceedings of the conference on Design, automation and test in Europe
Run-time instruction set selection in a transmutable embedded processor
Proceedings of the 45th annual Design Automation Conference
A fast and elitist multiobjective genetic algorithm: NSGA-II
IEEE Transactions on Evolutionary Computation
Power minimization for dynamically reconfigurable FPGA partitioning
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Hi-index | 0.00 |
FPGAs are increasingly used to implement coprocessors for applications running on desktop platforms, and soon such FPGA coprocessing may appear in mobile devices. Because one device may run different applications from another device, different coprocessor sets are needed for each device based on the device's usage. We introduce an approach wherein a device profiles application usage and uploads that information to a server when docked. The server then determines the best coprocessor set based on such usage and on the device's particular FPGA constraints. The server creates the coprocessor set by combining pre-synthesized coprocessors for each application, and considers multiple versions of the same coprocessor, versions that tradeoff speed and size. We introduce a coprocessor set selection problem and propose a Pareto-optimal merge heuristic for the server that yields near-optimal solutions with linear time complexity. We also use a method that avoids time-consuming resynthesis of the coprocessors into a single FPGA binary, by using small reconfigurable regions with reserved inter-region communication channels. Our experiments show that the Pareto-optimal merge heuristic generates results within 1% of the optimal on average and run 5-20x faster than simulated annealing. The experiments also show that a 3x speedup and 70% energy reduction can be achieved by using FPGA coprocessors versus running the applications only on a microprocessor.