Template-based memory access engine for accelerators in SoCs

Authors:
Bin Li;Zhen Fang;Ravi Iyer
Affiliations:
Intel Labs, Hillsboro, Oregon;Intel Labs, Hillsboro, Oregon;Intel Labs, Hillsboro, Oregon
Venue:
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Year:
2011

Citing 14
Cited 0

Tolerating latency through software-controlled prefetching in shared-memory multiprocessors

Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Effective jump-pointer prefetching for linked data structures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data Cache and Direct Memory Access in Programming Mediaprocessors

IEEE Micro
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
ASPEN: Towards Effective Simulation of Threads and Engines in Evolving Platforms

MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Data Cache Prefetching Using a Global History Buffer

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Accurate and Complexity-Effective Spatial Pattern Prediction

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Memory Prefetching Using Adaptive Stream Detection

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Atomic Vector Operations on Chip Multiprocessors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Accelerating mobile augmented reality on a handheld platform

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid progress in semiconductor technologies, more and more accelerators can be integrated onto a single SoC chip. In SoCs, accelerators often require deterministic data access. However, as more and more applications are running simultaneous, latency can vary significantly due to contention. To address this problem, we propose a template-based memory access engine (MAE) for accelerators in SoCs. The proposed MAE can handle several common memory access patterns observed for near-future accelerators. Our evaluation results show that the proposed MAE can significantly reduce memory access latency and jitter, thus very effective for accelerators in SoCs.