High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Odd memory systems: a new approach
Journal of Parallel and Distributed Computing
XOR storage schemes for frequently used data patterns
Journal of Parallel and Distributed Computing
Data caches for superscalar processors
ICS '97 Proceedings of the 11th international conference on Supercomputing
On high-bandwidth data cache design for multi-issue processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Microprocessor Architectures: From VLIW to Tta
Microprocessor Architectures: From VLIW to Tta
Latin Squares for Parallel Array Access
IEEE Transactions on Parallel and Distributed Systems
Multiskewing-A Novel Technique for Optimal Parallel Memory Access
IEEE Transactions on Parallel and Distributed Systems
On Design of Parallel Memory Access Schemes for Video Coding
Journal of VLSI Signal Processing Systems
The Organization and Use of Parallel Memories
IEEE Transactions on Computers
Low-power, high-performance TTA processor for 1024-point fast fourier transform
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Hi-index | 0.01 |
A conflict resolving parallel data memory system for Transport Triggered Architecture (TTA) is described. The architecture is generic and reusable to support various application specific designs. With parallel memory, more area and power consuming multi-port memory can be replaced with single-port memory modules. Number of ports can be increased over what is available on a design library for multi-port memories. In an FFT TTA example, dual-port data memory was replaced by the proposed architecture. To avoid memory conflicts, the original code was rescheduled and the TTA core was regenerated for the new schedule. The original memory required an area higher by a factor of 3.38 and energy higher by a factor of 1.70. In this case, the energy consumption of the processor core increased so that system energy consumption remained about the same. However, the original system required an area higher by a factor of 1.89.