Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
ASPEN: Towards Effective Simulation of Threads and Engines in Evolving Platforms
MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Data Cache Prefetching Using a Global History Buffer
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Accurate and Complexity-Effective Spatial Pattern Prediction
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Memory Prefetching Using Adaptive Stream Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Atomic Vector Operations on Chip Multiprocessors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Accelerating mobile augmented reality on a handheld platform
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Hi-index | 0.00 |
With the rapid progress in semiconductor technologies, more and more accelerators can be integrated onto a single SoC chip. In SoCs, accelerators often require deterministic data access. However, as more and more applications are running simultaneous, latency can vary significantly due to contention. To address this problem, we propose a template-based memory access engine (MAE) for accelerators in SoCs. The proposed MAE can handle several common memory access patterns observed for near-future accelerators. Our evaluation results show that the proposed MAE can significantly reduce memory access latency and jitter, thus very effective for accelerators in SoCs.