Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy

Authors:
Athanasios Milidonis;Nikolaos Alachiotis;Vasileios Porpodas;Harris Michail;Georgios Panagiotakopoulos;Athanasios P. Kakarountas;Costas E. Goutis
Affiliations:
VLSI Design Lab., Electrical & Computer Engineering Department, University of Patras, Patras, Greece;VLSI Design Lab., Electrical & Computer Engineering Department, University of Patras, Patras, Greece;VLSI Design Lab., Electrical & Computer Engineering Department, University of Patras, Patras, Greece;VLSI Design Lab., Electrical & Computer Engineering Department, University of Patras, Patras, Greece;VLSI Design Lab., Electrical & Computer Engineering Department, University of Patras, Patras, Greece;VLSI Design Lab., Electrical & Computer Engineering Department, University of Patras, Patras, Greece;VLSI Design Lab., Electrical & Computer Engineering Department, University of Patras, Patras, Greece
Venue:
Journal of Signal Processing Systems
Year:
2010

Citing 18
Cited 0

Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A comparison of data prefetching on an access decoupled and superscalar machine

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
High-level address optimization and synthesis techniques for data-transfer-intensive applications

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Analysis of high-level address code transformations for programmable processors

DATE '00 Proceedings of the conference on Design, automation and test in Europe
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Systematic data reuse exploration methodology for irregular access patterns

ISSS '00 Proceedings of the 13th international symposium on System synthesis
MediaBreeze: a decoupled architecture for accelerating multimedia applications

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Memory Latency Effects in Decoupled Architectures

IEEE Transactions on Computers
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Decoupled access/execute computer architectures

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Data Reuse Analysis Technique for Software-Controlled Memory Hierarchies

Proceedings of the conference on Design, automation and test in Europe - Volume 1
An integrated hardware/software approach for run-time scratchpad management

Proceedings of the 41st annual Design Automation Conference
A loop accelerator for low power embedded VLIW processors

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Compiler-Based Approach for Exploiting Scratch-Pad in Presence of Irregular Array Access

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Analysis of scratch-pad and data-cache performance using statistical methods

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A compiler-based approach for dynamically managing scratch-pad memories in embedded systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an architecture of decoupled processors with a memory hierarchy consisting only of scratch-pad memories, and a main memory. This architecture exploits the more efficient pre-fetching of Decoupled processors, that make use of the parallelism between address computation and application data processing, which mainly exists in streaming applications. This benefit combined with the ability of scratch-pad memories to store data with no conflict misses and low energy per access contributes significantly for increasing the system's performance. The application code is split in two parallel programs the first runs on the Access processor and computes the addresses of the data in the memory hierarchy. The second processes the application data and runs on the Execute processor, a processor with a limited address space--just the register file addresses. Each transfer of any block in the memory hierarchy up to the Execute processor's register file is controlled by the Access processor and the DMA units. This strongly differentiates this architecture from traditional uniprocessors and existing decoupled processors with cache memory hierarchies. The architecture is compared in performance with uniprocessor architectures with (a) scratch-pad and (b) cache memory hierarchies and (c) the existing decoupled architectures, showing its higher normalized performance. The reason for this gain is the efficiency of data transferring that the scratch-pad memory hierarchy provides combined with the ability of the Decoupled processors to eliminate memory latency using memory management techniques for transferring data instead of fixed prefetching methods. Experimental results show that the performance is increased up to almost 2 times compared to uniprocessor architectures with scratch-pad and up to 3.7 times compared to the ones with cache. The proposed architecture achieves the above performance without having penalties in energy delay product costs.