Transparent data-memory organizations for digital signal processors

Authors:
Sadagopan Srinivasan;Vinodh Cuppu;Bruce Jacob
Affiliations:
University of Maryland at College Park, College Park, MD;University of Maryland at College Park, College Park, MD;University of Maryland at College Park, College Park, MD
Venue:
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Year:
2001

Citing 2
Cited 2

A performance comparison of contemporary DRAM architectures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
High-Performance DRAMs in Workstation Environments

IEEE Transactions on Computers

VHC: Quickly Building an Optimizer for Complex Embedded Architectures

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A heterogeneous multi-core processor architecture for high performance computing

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today's digital signal processors (DSPs), unlike general-purpose processors, use a non-uniform addressing model in which the primary components of the memory system-the DRAM and dual tagless SRAMs-are referenced through completely separate segments of the address space. The recent trend of programming DSPs in high-level languages instead of assembly code has exposed this memory model as a potential weakness, as the model makes for a poor compiler target. In many of today's high-performance DSPs this non-uniform model is being replaced by a uniform model-a transparent organization like that of most general-purpose systems, in which all memory structures share the same address space as the DRAM systemIn such a memory organization, one must replace the DSP's tagless SRAMs with something resembling a general-purpose cache. This study investigates the performance of a range of traditional and slightly non-traditional cache organizations for a high-performance DSP, the Texas Instruments 'C6000 VLIW DSP. The traditional cache organizations range from a fraction of a kilobyte to several kilobytes; they approach the SRAM performance and, for some benchmarks, beat it. In the non-traditional cache organizations, rather than simply adding tags to the large on-chip SRAM structure, we take advantage of the relatively regular memory access behavior of most DSP applications and replace the tagless SRAM with a near-traditional cache that uses a very small number of wide blocks. This performs similarly to the traditional caches but uses less storage. In general, we find that one can achieve nearly the same performance as a tagless SRAM while using a much smaller footprint.