Programming the FlexRAM parallel intelligent memory system

Authors:
Basilio B. Fraguela;Jose Renau;Paul Feautrier;David Padua;Josep Torrellas
Affiliations:
Universidade da Coruña, Spain;University of Illinois at Urbana-Champaign, USA;LIP, Ecole Normale Supérieure de Lyon, France;University of Illinois at Urbana-Champaign, USA;University of Illinois at Urbana-Champaign, USA
Venue:
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2003

Citing 14
Cited 4

SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
Embedded DRAM technology opportunities and challenges

IEEE Spectrum
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Automatic Code Mapping on an Intelligent Memory Architecture

IEEE Transactions on Computers
Content Addressable Parallel Processors

Content Addressable Parallel Processors
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
Baring It All to Software: Raw Machines

Computer
The Gilgamesh MIND Processor-in-Memory Architecture for Petaflops-Scale Computing

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
FlexRAM: Toward an Advanced Intelligent Memory System

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
High Performance Fortran: Language Specification (PART II)

ACM SIGPLAN Fortran Forum - Special issue: high performance Fortran language specification, part 2

A low cost, multithreaded processing-in-memory system

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Application of full-system simulation in exploratory system design and development

IBM Journal of Research and Development
Self-aware memory: managing distributed memory in an autonomous multi-master environment

ARCS'08 Proceedings of the 21st international conference on Architecture of computing systems
Adaptive multiclient network-on-chip memory core: hardware architecture, software abstraction layer, and application exploration

International Journal of Reconfigurable Computing - Special issue on Selected Papers from the 2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2011)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In an intelligent memory architecture, the main memory of a computer is enhanced with many simple processors. The result is a highly-parallel, heterogeneous machine that is able to exploit computation in the main memory. While several instantiations of this architecture have been proposed, the question of how to effectively program them with little effort has remained a major challenge.In this paper, we show how to effectively hand-program an intelligent memory architecture at a high level and with very modest effort. We use FlexRAM as a prototype architecture. To program it, we propose a family of high-level compiler directives inspired by OpenMP called CFlex. Such directives enable the processors in memory to execute the program in cooperation with the main processor. In addition, we propose libraries of highly-optimized functions called Intelligent Memory Operations (IMOs). These functions program the processors in memory through CFlex, but make them completely transparent to the programmer. Simulation results show that, with CFlex and IMOs, a server with 64 simple processors in memory runs on average 10 times faster than a conventional server. Moreover, a set of conventional programs with 240 lines on average are transformed into CFlex parallel form with only 7 CFlex directives and 2 additional statements on average.