Computation in the context of transport triggered architectures
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, Part 1
Efficient conditional operations for data-parallel architectures
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Transport-Triggering versus Operation-Triggering
CC '94 Proceedings of the 5th International Conference on Compiler Construction
DLP +TLP Processors for the Next Generation of Media Workloads
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Stream Register Files with Indexed Access
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Compiler Manipulation of Stream Descriptors for Data Access Optimization
ICPPW '06 Proceedings of the 2006 International Conference Workshops on Parallel Processing
Hi-index | 0.00 |
A novel embedded processor element basing on the Transport Triggered Architecture is presented in this paper. The processor element consisting of two powerful arithmetic clusters using the application specific instruction processor design methodology achieves higher performance and is especially good at exploiting the instruction level and data level parallelisms in the multimedia applications. To improve the efficiency, the processor also presents the decoupled stream memory system with the characteristics of the stream buffer proxy to support the cross-line indexed accesses and to enhance the memory bandwidth. Then, a heterogeneous multiprocessor SoC chip involving the embedded processor is fabricated using 0.13um CMOS process, and the SoC operates at 400MHz and consumes only around 690mW. Experimental results show that the embedded processor element has good performance improvement for the multimedia applications.