Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Designing Modular Hardware Accelerators in C with ROCCC 2.0
FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
Design space exploration of a mesochronous link for cost-effective and flexible GALS NOCs
Proceedings of the Conference on Design, Automation and Test in Europe
Cache-Based Memory Copy Hardware Accelerator for Multicore Systems
IEEE Transactions on Computers
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
A design methodology to implement memory accesses in high-level synthesis
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Massively parallel programming models used as hardware description languages: the OpenCL case
Proceedings of the International Conference on Computer-Aided Design
Architecture support for accelerator-rich CMPs
Proceedings of the 49th Annual Design Automation Conference
Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse
Proceedings of the 49th Annual Design Automation Conference
Introduction of Architecturally Visible Storage in Instruction Set Extensions
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
CHARM: a composable heterogeneous accelerator-rich microprocessor
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
OpenMP-based Synergistic Parallelization and HW Acceleration for On-Chip Shared-Memory Clusters
DSD '12 Proceedings of the 2012 15th Euromicro Conference on Digital System Design
Neural Acceleration for General-Purpose Approximate Programs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Several many-core designs tackle scalability issues by leveraging tightly-coupled clusters as building blocks, where low-latency, high-bandwidth interconnection between a small/medium number of cores and L1 memory achieves high performance/watt. Tight coupling of hardware accelerators into these multi-core clusters constitutes a promising approach to further improve performance/area/watt. However, accelerators are often clocked at a lower frequency than processor clusters for energy efficiency reasons. In this paper, we propose a technique to integrate shared-memory accelerators within the tightly-coupled clusters of the STMicroelectronics STHORM architecture. Our methodology significantly relaxes timing constraints for tightly-coupled accelerators, while optimizing data bandwidth. In addition, our technique allows to operate the accelerator at an integer submultiple of the cluster frequency. Experimental results show that the proposed approach allows to recover up to 84% of the slow-down implied by reduced accelerator speed.