Parallel image processing applications on a network of workstations
Parallel Computing
IEEE Transactions on Parallel and Distributed Systems
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor
International Journal of Parallel Programming
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing explicit data transfers for data parallel applications on the cell architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Proceedings of the 49th Annual Design Automation Conference
Hi-index | 0.00 |
Reducing the effects of off-chip memory access latency is a key factor in exploiting efficiently embedded multi-core platforms. We consider architectures that admit a multi-core computation fabric, having its own fast and small memory to which the data blocks to be processed are fetched from external memory using a DMA (direct memory access) engine, employing a double- or multiple-buffering scheme to avoid processor idling. In this paper we focus on application programs that process two-dimensional data arrays and we determine automatically the size and shape of the portions of the data array which are subject to a single DMA call, based on hardware and applications parameters. When the computation on different array elements are completely independent, the asymmetry of memory structure leads always to prefer one-dimensional horizontal pieces of memory, while when the computation of a data element shares some data with its neighbors, there is a pressure for more ''square'' shapes to reduce the amount of redundant data transfers. We provide an analytic model for this optimization problem and validate our results by running a mean filter application on the Cell simulator.