An Efficient Memory Organization for High-ILP Inner Modem Baseband SDR Processors

Authors:
Bjorn De Sutter;Osman Allam;Praveen Raghavan;Roeland Vandebriel;Hans Cappelle;Tom Vander Aa;Bingfeng Mei
Affiliations:
Ghent University and Vrije Universiteit Brussel, Ghent, Belgium 9000;Interuniversity Micro-Electronics Center (IMEC), Heverlee, Belgium 3001;Interuniversity Micro-Electronics Center (IMEC), Heverlee, Belgium 3001;Interuniversity Micro-Electronics Center (IMEC), Heverlee, Belgium 3001;Interuniversity Micro-Electronics Center (IMEC), Heverlee, Belgium 3001;Interuniversity Micro-Electronics Center (IMEC), Heverlee, Belgium 3001;Interuniversity Micro-Electronics Center (IMEC), Heverlee, Belgium 3001
Venue:
Journal of Signal Processing Systems
Year:
2010

Citing 34
Cited 0

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Conflict-Free Vector Access Using a Dynamic Storage Scheme

IEEE Transactions on Computers
Pseudo-randomly interleaved memory

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
On high-bandwidth data cache design for multi-issue processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving Cache Locality by a Combination of Loop and Data Transformations

IEEE Transactions on Computers - Special issue on cache memory and related problems
Synthesis of custom interleaved memory systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Storage Management Programmable Process

Storage Management Programmable Process
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Conflict-Free Access for Streams in Multimodule Memories

IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems

IEEE Transactions on Parallel and Distributed Systems
A Framework for Parallelizing Load/Stores on Embedded Processors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Compiler-Directed Array Interleaving for Reducing Energy in Multi-Bank Memories

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Maps: a compiler-managed memory system for software-exposed architectures

Maps: a compiler-managed memory system for software-exposed architectures
Reducing Design Complexity of the Load/Store Queue

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Custom Data Layout for Memory Parallelism

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
On Design of Parallel Memory Access Schemes for Video Coding

Journal of VLSI Signal Processing Systems
Load-Store Queue Management: an Energy-Efficient Design Based on a State-Filtering Mechanism.

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Fire-and-Forget: Load/Store Scheduling with No Store Queue at All

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Fast, Efficient and Predictable Memory Accesses: Optimization Algorithms for Memory Architecture Aware Compilation

Fast, Efficient and Predictable Memory Accesses: Optimization Algorithms for Memory Architecture Aware Compilation
Late-binding: enabling unordered load-store queues

Proceedings of the 34th annual international symposium on Computer architecture
Vector processing as an enabler for software-defined radio in handheld devices

EURASIP Journal on Applied Signal Processing
Memory scheduling for modern microprocessors

ACM Transactions on Computer Systems (TOCS)
Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
An automatic scratch pad memory management tool and MPEG-4 encoder case study

Proceedings of the 45th annual Design Automation Conference
A coarse-grained array based baseband processor for 100Mbps+ software defined radio

Proceedings of the conference on Design, automation and test in Europe
A Coarse-Grained Array Accelerator for Software-Defined Radio Baseband Processing

IEEE Micro
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
SPR: an architecture-adaptive CGRA mapping tool

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Energy-performance Exploration of a CGA-based SDR Processor

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a memory organization for SDR inner modem baseband processors that focus on exploiting ILP. This memory organization uses power-efficient, single-ported, interleaved scratch-pad memory banks to provide enough bandwidth to a high-ILP processors. A system of queues in the memory interface is used to resolve bank conflicts among the single-ported banks, and to spread long bursts of conflicting accesses to the same bank over time. Bank address rotation is used to spread long bursts of conflicting accesses over multiple banks. All proposed techniques have been implemented in hardware, and are evaluated for a number of different wireless communication standards. For the 11a|n benchmarks, the overhead of stall cycles resulting from unresolved bank conflicts can be reduced to below 2% with the proposed organization. For 3GPP-LTE, the most demanding wireless standard we evaluated, the overhead is reduced to less than 0.13%. This is achieved with little energy and area overhead, and without any bank-aware compiler support.