A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
IEEE Transactions on Parallel and Distributed Systems
Minimizing Conflicts Between Vector Streams in Interleaved Memory Systems
IEEE Transactions on Computers
Compiler-directed selection of dynamic memory layouts
Proceedings of the ninth international symposium on Hardware/software codesign
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems
IEEE Transactions on Parallel and Distributed Systems
A Graph Based Framework to Detect Optimal Memory Layouts for Improving Data Locality
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
A Compiler Address Transformation For Conflict-Free Access of Memories and Networks
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Quasidynamic Layout Optimizations for Improving Data Locality
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Computers
A Constraint Network Based Approach to Memory Layout Optimization
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
An efficient deblocking filter architecture with 2-dimensional parallel memory for H.264/AVC
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
The design space of data-parallel memory systems
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Bounded-Collision Memory-Mapping Schemes for Data Structures with Applications to Parallel Memories
IEEE Transactions on Parallel and Distributed Systems
Theoretical Limitations on the Efficient Use of Parallel Memories
IEEE Transactions on Computers
On the Effective Bandwidth of Parallel Memories
IEEE Transactions on Computers
The Organization and Use of Parallel Memories
IEEE Transactions on Computers
Byte and modulo addressable parallel memory architecture for video coding
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
The advancement of process technology enables the integration of multiple cores featuring parallel processing of several tasks in a single die. The requirement of extensive memory bandwidth puts a major performance bottleneck in the multi-core architecture for media applications. While the parallel memory system is a viable solution to account for a large amount of memory transactions required by multiple cores, memory access conflicts caused by simultaneous accesses to an identical memory page by two or several cores limit the performance of the multi-core architecture. We propose and evaluate the programmable memory address shuffler associated with the novel memory shuffling algorithm integrated in multi-core architecture with parallel memory system. The address shuffler efficiently translates the requested memory addresses into the shuffled addresses such that the amount of simultaneous accesses to an identical physical memory diminishes. Programmability of the address shuffler enables the adaptive address shuffling depending on application-specific memory access patterns. The proposed shuffling algorithm relocates partitioned memory sub-pages based on memory access conflict graph obtained by profiling memory access pattern of an application. We demonstrate that the shuffled sub-pages are represented by cyclic linked list which enables partial address shuffling with the minimal number of shuffling table entries reducing hardware complexity. The programmable address shuffler reduces the amount of access conflicts by 83% for pitch-shifting audio decompression.