SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
Embedded DRAM technology opportunities and challenges
IEEE Spectrum
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Automatic Code Mapping on an Intelligent Memory Architecture
IEEE Transactions on Computers
Content Addressable Parallel Processors
Content Addressable Parallel Processors
The Gilgamesh MIND Processor-in-Memory Architecture for Petaflops-Scale Computing
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
FlexRAM: Toward an Advanced Intelligent Memory System
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
High Performance Fortran: Language Specification (PART II)
ACM SIGPLAN Fortran Forum - Special issue: high performance Fortran language specification, part 2
A low cost, multithreaded processing-in-memory system
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Application of full-system simulation in exploratory system design and development
IBM Journal of Research and Development
Self-aware memory: managing distributed memory in an autonomous multi-master environment
ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
International Journal of Reconfigurable Computing - Special issue on Selected Papers from the 2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2011)
Hi-index | 0.00 |
In an intelligent memory architecture, the main memory of a computer is enhanced with many simple processors. The result is a highly-parallel, heterogeneous machine that is able to exploit computation in the main memory. While several instantiations of this architecture have been proposed, the question of how to effectively program them with little effort has remained a major challenge.In this paper, we show how to effectively hand-program an intelligent memory architecture at a high level and with very modest effort. We use FlexRAM as a prototype architecture. To program it, we propose a family of high-level compiler directives inspired by OpenMP called CFlex. Such directives enable the processors in memory to execute the program in cooperation with the main processor. In addition, we propose libraries of highly-optimized functions called Intelligent Memory Operations (IMOs). These functions program the processors in memory through CFlex, but make them completely transparent to the programmer. Simulation results show that, with CFlex and IMOs, a server with 64 simple processors in memory runs on average 10 times faster than a conventional server. Moreover, a set of conventional programs with 240 lines on average are transformed into CFlex parallel form with only 7 CFlex directives and 2 additional statements on average.