Embedded intelligent SRAM

Authors:
Prabhat Jain;G. Edward Suh;Srinivas Devadas
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA
Venue:
Proceedings of the 40th annual Design Automation Conference
Year:
2003

Citing 9
Cited 3

Memory bank and register allocation in software synthesis for ASIPs

ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
Simple vector microprocessors for multimedia applications

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Storage assignment optimizations to generate compact and efficient code on embedded DSPs

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Optimized address assignment for DSPs with SIMD memory accesses

Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
AltiVec Extension to PowerPC Accelerates Media Processing

IEEE Micro
Variable partitioning for dual memory bank DSPs

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02

Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
High-level synthesis using computation-unit integrated memories

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Generation of heterogeneous distributed architectures for memory-intensive applications through high-level synthesis

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many embedded systems use a simple pipelined RISC processor for computation and an on-chip SRAM for data storage. We present an enhancement called Intelligent SRAM (ISRAM) that consists of a small computation unit with an accumulator that is placed near the on-chip SRAM. The computation unit can perform operations on two words from the same SRAM row or on one word from the SRAM and the other from the accumulator. This ISRAM enhancement requires only a few additional instructions to support the computation unit. We present a computation partitioning algorithm that assigns the computations to the processor or to the new computation unit for a given data flow graph of the program. Performance improvement comes from the reduction in the number of accesses to the SRAM, the number of instructions, and the number of pipeline stalls compared to the same operations in the processor. The experimental results on various benchmarks show up to 1.48 performance speedup with our enhancement.