A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
High-Performance DRAMs in Workstation Environments
IEEE Transactions on Computers
VHC: Quickly Building an Optimizer for Complex Embedded Architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A heterogeneous multi-core processor architecture for high performance computing
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.00 |
Today's digital signal processors (DSPs), unlike general-purpose processors, use a non-uniform addressing model in which the primary components of the memory system-the DRAM and dual tagless SRAMs-are referenced through completely separate segments of the address space. The recent trend of programming DSPs in high-level languages instead of assembly code has exposed this memory model as a potential weakness, as the model makes for a poor compiler target. In many of today's high-performance DSPs this non-uniform model is being replaced by a uniform model-a transparent organization like that of most general-purpose systems, in which all memory structures share the same address space as the DRAM systemIn such a memory organization, one must replace the DSP's tagless SRAMs with something resembling a general-purpose cache. This study investigates the performance of a range of traditional and slightly non-traditional cache organizations for a high-performance DSP, the Texas Instruments 'C6000 VLIW DSP. The traditional cache organizations range from a fraction of a kilobyte to several kilobytes; they approach the SRAM performance and, for some benchmarks, beat it. In the non-traditional cache organizations, rather than simply adding tags to the large on-chip SRAM structure, we take advantage of the relatively regular memory access behavior of most DSP applications and replace the tagless SRAM with a near-traditional cache that uses a very small number of wide blocks. This performs similarly to the traditional caches but uses less storage. In general, we find that one can achieve nearly the same performance as a tagless SRAM while using a much smaller footprint.