Design of storage hierarchy in multithreaded architectures
Proceedings of the 28th annual international symposium on Microarchitecture
Area and System Clock Effects on SMT/CMP Processors
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Transparent Threads: Resource Sharing in SMT Processors for High Single-Thread Performance
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Compiling for instruction cache performance on a multithreaded architecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Out-of-Order Execution may not be Cost-Effective on Processors Featuring Simultaneous Multithreading
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
MPEG-2 Video Decompression on Simultaneous Multithreaded Multimedia Processors
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Soft Real- Time Scheduling on Simultaneous Multithreaded Processors
RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
Local memory exploration and optimization in embedded systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Supporting microthread scheduling and synchronisation in CMPs
International Journal of Parallel Programming
Hi-index | 0.00 |
In embedded multithreaded architectures, the performance enhancement relative to the base single-threaded architecture is highly dependent on the characteristics of the application and memory configuration. When the application is well parallelized, the multithreading performance may be good even with a small cache since the memory access latency can be hidden. However, if there are complicated dependencies between threads, they cause frequent cache conflicts, so the performance may not be improved. For that reason, not only processor architecture but also memory configuration should be customized to get an optimal solution of an embedded multithreaded system. We suggest a design space exploration algorithm, which considers both memory configuration and multithreaded architecture and a thread shifting technique, which shifts threads in compile time to minimize cache conflict.