Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Exploring multimedia applications locality to improve cache performance
MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Imagine: Media Processing with Streams
IEEE Micro
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
A programming system for the imagine media processor
A programming system for the imagine media processor
Programmable Stream Processors
Computer
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP
ACM Transactions on Architecture and Code Optimization (TACO)
Stream Processors: Progammability and Efficiency
Queue - DSPs
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Exploiting Cache in Multimedia
ICMCS '99 Proceedings of the IEEE International Conference on Multimedia Computing and Systems - Volume 2
Hi-index | 0.00 |
Effective memory utilization is critical to reap the benefits of the multi-core processors emerging on embedded systems. In this paper we explore the use of a stream model to effectively utilize memory hierarchies.We target image processing algorithms running on the Analog Devices Blackfin BF561 fixedpoint, dual-core DSP. Using optimized assembly to effectively use cores reduces runtime, but also underscores the need to mitigate the memory bottleneck. Like other embedded processors, the Blackfin BF561 has L2 SRAM available. Applying the stream model allows us to effectively make full use of both cores and the L2 SRAM. We achieve almost a 10X speedup in execution time compared to non-optimized C code.