Proceedings of the 1990 ACM/IEEE conference on Supercomputing
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing
IEEE Transactions on Parallel and Distributed Systems
IEEE Micro
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Multigrain Parallel Processing on Compiler Cooperative Chip Multiprocessor
INTERACT '05 Proceedings of the 9th Annual Workshop on Interaction between Compilers and Computer Architectures
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Hierarchical parallelism control for multigrain parallel processing
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
Heterogeneous multicore architectures, integrating several kinds of accelerator cores in addition to general purpose processor cores, have been attracting much attention to realize high performance with low power consumption. To attain effective high performance, high application software productivity, and low power consumption on heterogeneous multicores, cooperation between an architecture and a parallelizing compiler is important. This paper proposes a compiler cooperative heterogeneous multicore architecture and parallelizing compilation scheme for it. Performance of the proposed scheme is evaluated on the heterogeneous multicore integrating Hitachi and Renesas' SH4A processor cores and Hitachi's FE-GA accelerator cores, using an MP3 encoder. The heterogeneous multicore gives us 14.34 times speedup with two SH4As and two FE-GAs, and 26.05 times speedup with four SH4As and four FE-GAs against sequential execution with a single SH4A. The cooperation between the heterogeneous multicore architecture and the parallelizing compiler enables to achieve high performance in a short development period.