Multiprocessor System-on-Chip designs with active memory processors for higher memory efficiency

Authors:
Junhee Yoo;Sungjoo Yoo;Kiyoung Choi
Affiliations:
Seoul National University;POSTECH;Seoul National University
Venue:
Proceedings of the 46th Annual Design Automation Conference
Year:
2009

Citing 11
Cited 1

Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Effective jump-pointer prefetching for linked data structures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Application-specific memory management for embedded systems using software-controlled caches

Proceedings of the 37th Annual Design Automation Conference
Data prefetch mechanisms

ACM Computing Surveys (CSUR)
Cache-oblivious priority queue and graph algorithm applications

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Cache-oblivious B-trees

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Memory Coloring: A Compiler Approach for Scratchpad Memory Management

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies

Proceedings of the 43rd annual Design Automation Conference
Active memory operations

Proceedings of the 21st annual international conference on Supercomputing

A distributed interleaving scheme for efficient access to WideIO DRAM memory

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory access latency and memory-related operations are often the performance bottleneck in parallel applications. In this paper, we present a concept of active memory operations which is an on-chip network transaction that operates based on the microcode provided by the software designer. Utilizing the active memory operation, we can replace multiple transactions of memory accesses over the on-chip network and related local processing element computation with a smaller number of high-level transactions and near-memory computation. We implemented a processor called active memory processor which is located near the memory and executes the active memory operations. In our case studies, we applied the concept to three real-world applications (parallelized JPEG, FFT, and text indexing for data mining) running on a 36-tile architecture with 32 cores and 4 memories and found that the programmable transaction approach can improve performance by 34.3% to 618% at the cost of additional design effort.