Integer and combinatorial optimization
Integer and combinatorial optimization
P-Complete Approximation Problems
Journal of the ACM (JACM)
Heterogeneous parallel and distributed computing
Parallel Computing - Special Anniversary issue
Techniques for mapping tasks to machines in heterogeneous computing systems
Journal of Systems Architecture: the EUROMICRO Journal - Heterogeneous distributed and parallel architectures: hardware, software and design tools
Journal of Parallel and Distributed Computing
Link contention-constrained scheduling and mapping of tasks
Cluster Computing
Automatic Extraction of Functional Parallelism from Ordinary Programs
IEEE Transactions on Parallel and Distributed Systems
On Exploiting Heterogeneity for Cluster Based Parallel Multithreading Using Task Duplication
The Journal of Supercomputing
Hardware/software partitioning of software binaries
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Protocol-Dependent Message-Passing Performance on Linux Clusters
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning
Proceedings of the conference on Design, automation and test in Europe - Volume 1
A Decompilation Approach to Partitioning Software for Microprocessor/FPGA Platforms
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
The Granularity Metric for Fine-Grain Real-Time Scheduling
IEEE Transactions on Computers
Multi-level placement for large-scale mixed-size IC designs
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
A semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systems
Journal of Parallel and Distributed Computing
On multiprocessor task scheduling using efficient state space search approaches
Journal of Parallel and Distributed Computing
Proceedings of the 41st annual Design Automation Conference
Journal of Parallel and Distributed Computing
Combining building blocks for parallel multi-level matrix multiplication
Parallel Computing
An algorithm for the generalized quadratic assignment problem
Computational Optimization and Applications
Static heuristics for robust resource allocation of continuously executing applications
Journal of Parallel and Distributed Computing
Parallel Computing Experiences with CUDA
IEEE Micro
Hierarchical Scheduling Framework for Virtual Clustering of Multiprocessors
ECRTS '08 Proceedings of the 2008 Euromicro Conference on Real-Time Systems
Axel: a heterogeneous cluster with FPGAs and GPUs
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Resource allocation algorithms for virtualized service hosting platforms
Journal of Parallel and Distributed Computing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Reformulations in mathematical programming: automatic symmetry detection and exploitation
Mathematical Programming: Series A and B
Scheduling for heterogeneous Systems using constrained critical paths
Parallel Computing
CHIPS: Custom Hardware Instruction Processor Synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems
Parallel Computing
Robust Software Partitioning with Multiple Instantiation
INFORMS Journal on Computing
Improving communication latency with the write-only architecture
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
This paper introduces a method to combine the advantages of both task parallelism and fine-grained co-design specialisation to achieve faster execution times than either method alone on distributed heterogeneous architectures. The method uses a novel mixed integer linear programming formalisation to assign code sections from parallel tasks to share computational components with the optimal trade-off between acceleration from component specialism and serialisation delay. The paper provides results for software benchmarks partitioned using the method and formal implementations of previous alternatives to demonstrate both the practical tractability of the linear programming approach and the increase in program acceleration potential deliverable.