On the effective bandwidth of interleaved memories in vector processor systems
IEEE Transactions on Computers
Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Performance of image and video processing with general-purpose processors and media ISA extensions
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Application-specific memory management for embedded systems using software-controlled caches
Proceedings of the 37th Annual Design Automation Conference
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Energy aware compilation for DSPs with SIMD instructions
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Reducing Interference Among Vector Accesses in Interleaved Memories
IEEE Transactions on Computers
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
Vectorizing for a SIMdD DSP architecture
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance
An Empirical Study On the Vectorization of Multimedia Applications for Multimedia Extensions
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Memory Systems for Image Processing
IEEE Transactions on Computers
The Organization and Use of Parallel Memories
IEEE Transactions on Computers
Trimaran: an infrastructure for research in instruction-level parallelism
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A compiler-based approach for dynamically managing scratch-pad memories in embedded systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
The single instruction multiple data (SIMD) architecture is very efficient for executing arithmetic intensive programs, but frequently suffers from data-alignment problems. The data-alignment problem not only induces extra time overhead but also hinders automatic vectorization of the SIMD compiler. In this paper, we compare three on-chip memory systems, which are single-bank, multi-bank, and multi-port, for the SIMD architecture to resolve the data-alignment problems. The single-bank memory is the simplest, but supports only the aligned accesses. The multi-bank memory requires a little higher complexity, but enables the unaligned accesses and the stride accesses with a bank-conflict limitation. The multi-port memory is capable of both the unaligned and stride accesses without any restriction, but needs quite much expensive hardware. We also developed a vectorizing compiler that can conduct dynamic memory allocation and SIMD code generation. The performances of the three memory systems with our SIMD compiler are evaluated using several digital signal processing kernels and the MPEG2 encoder. The experimental results show that the multi-bank memory can carry out MPEG2 encoding 5.8 times faster, whereas the single-bank memory only achieves 2.9 times speed-up when employed in a multimedia system with a 2-issue host processor and an 8-way SIMD coprocessor. The multi-port memory obviously shows the best performance, which is however an impractical improvement over the multi-bank memory when the hardware cost is considered.