High-bandwidth Address Generation Unit

Authors:
Carlo Galuzzi;Chunyang Gou;Humberto Calderón;Georgi N. Gaydadjiev;Stamatis Vassiliadis
Affiliations:
Computer Engineering Laboratory, Electrical Engineering, Mathematics and Computer Science Faculty, TU Delft, Delft, The Netherlands;Computer Engineering Laboratory, Electrical Engineering, Mathematics and Computer Science Faculty, TU Delft, Delft, The Netherlands;Computer Engineering Laboratory, Electrical Engineering, Mathematics and Computer Science Faculty, TU Delft, Delft, The Netherlands;Computer Engineering Laboratory, Electrical Engineering, Mathematics and Computer Science Faculty, TU Delft, Delft, The Netherlands;Computer Engineering Laboratory, Electrical Engineering, Mathematics and Computer Science Faculty, TU Delft, Delft, The Netherlands
Venue:
Journal of Signal Processing Systems
Year:
2009

Citing 28
Cited 0

Vector Computer Memory Bank Contention

IEEE Transactions on Computers
Conflict-Free Vector Access Using a Dynamic Storage Scheme

IEEE Transactions on Computers
Increased Memory Performance During Vector Accesses Through the Use of Linear Address Transformations

IEEE Transactions on Computers
Interleaved parallel schemes: improving memory throughput on supercomputers

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Vector architectures: past, present and future

ICS '98 Proceedings of the 12th international conference on Supercomputing
The CRAY-1 computer system

Readings in computer architecture
The Burroughs scientific processor(BSP)

Readings in computer architecture
Algorithmic foundations for a parallel vector access memory system

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Architectural and application: the performance of the NEC SX-4 on the NCAR benchmark suite

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
An Efficient Buffer Memory System for Subarray Access

IEEE Transactions on Parallel and Distributed Systems
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
Internet Streaming SIMD Extensions

Computer
Exploiting Instruction- and Data-Level Parallelism

IEEE Micro
AltiVec Extension to PowerPC Accelerates Media Processing

IEEE Micro
High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study

IEEE Transactions on Computers
Conflict-Free Access for Streams in Multimodule Memories

IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems

IEEE Transactions on Parallel and Distributed Systems
Implementation and Evaluation of the Complex Streamed Instruction Set

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Command Vector Memory Systems: High Performance at Low Cost

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Multiaccess Memory System for Attached SIMD Computer

IEEE Transactions on Computers
The MOLEN Polymorphic Processor

IEEE Transactions on Computers
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Reconfigurable Fixed Point Dense and Sparse Matrix-Vector Multiply/Add Unit

ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
The Prime Memory System for Array Access

IEEE Transactions on Computers
The Organization and Use of Parallel Memories

IEEE Transactions on Computers
The GPU Enters Computing's Mainstream

Computer
Reconfigurable multiple operation array

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Multimedia rectangularly addressable memory

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present an efficient data fetch circuitry to retrieve several operands from a n-way parallel memory system in a single machine cycle. The proposed address generation unit operates with an improved version of the low-order parallel memory access approach. Our design supports data structures of arbitrary lengths and different odd strides. The experimental results show that our address generation unit is capable of generating eight 32驴驴驴bit addresses every 6 ns for different strides when implemented on a VIRTEX-II PRO xc2vp30-7ff1696 FPGA device using only trivial hardware resources.