Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Application-specific memory management for embedded systems using software-controlled caches
Proceedings of the 37th Annual Design Automation Conference
ACM Computing Surveys (CSUR)
Cache-oblivious priority queue and graph algorithm applications
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Memory Coloring: A Compiler Approach for Scratchpad Memory Management
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Multiprocessor system-on-chip data reuse analysis for exploring customized memory hierarchies
Proceedings of the 43rd annual Design Automation Conference
Proceedings of the 21st annual international conference on Supercomputing
A distributed interleaving scheme for efficient access to WideIO DRAM memory
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Hi-index | 0.00 |
Memory access latency and memory-related operations are often the performance bottleneck in parallel applications. In this paper, we present a concept of active memory operations which is an on-chip network transaction that operates based on the microcode provided by the software designer. Utilizing the active memory operation, we can replace multiple transactions of memory accesses over the on-chip network and related local processing element computation with a smaller number of high-level transactions and near-memory computation. We implemented a processor called active memory processor which is located near the memory and executes the active memory operations. In our case studies, we applied the concept to three real-world applications (parallelized JPEG, FFT, and text indexing for data mining) running on a 36-tile architecture with 32 cores and 4 memories and found that the programmable transaction approach can improve performance by 34.3% to 618% at the cost of additional design effort.