Automatic Code Mapping on an Intelligent Memory Architecture

Authors:
Yan Solihin;Jaejin Lee;Josep Torrellas
Affiliations:
Univ. of Illinois, Urbana-Champaign;Michigan State Univ. Lansing;Univ. of Illinois, Urbana-Champaign
Venue:
IEEE Transactions on Computers
Year:
2001

Citing 17
Cited 8

Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
The Legion vision of a worldwide virtual computer

Communications of the ACM
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
A bandwidth-efficient architecture for media processing

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Maps: a compiler-managed memory system for raw machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Embedded DRAM technology opportunities and challenges

IEEE Spectrum
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
Parallel Programming with Polaris

Computer
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
Baring It All to Software: Raw Machines

Computer
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors

MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
The Globus Project: A Status Report

HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
An Direct-Execution Framework for Fast and Accurate Simulation of Superscalar Processors

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
FlexRAM: Toward an Advanced Intelligent Memory System

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Automatically Mapping Code on an Intelligent Memory Architecture

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

Programming the FlexRAM parallel intelligent memory system

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Predicting Cache Space Contention in Utility Computing Servers

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
High-level synthesis using computation-unit integrated memories

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
An analytical model for cache replacement policy performance

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Generation of heterogeneous distributed architectures for memory-intensive applications through high-level synthesis

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Helper thread prefetching for loosely-coupled multiprocessor systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Active memory controller

The Journal of Supercomputing

Quantified Score

Hi-index	14.99

Visualization

Abstract

This paper presents an algorithm to automatically map code on a generic intelligent memory system that consists of a high-end host processor and a simpler memory processor. To achieve high performance with this type of architecture, the code needs to be partitioned and scheduled such that each section is assigned to the processor on which it runs most efficiently. In addition, the two processors should overlap their execution as much as possible. With our algorithm, applications are mapped fully automatically using both static and dynamic information. Using a set of standard applications and a simulated architecture, we obtain average speedups of 1.7 for numerical applications and 1.2 for nonnumerical applications over a single host with plain memory. The speedups are very close and often higher than ideal speedups on a more expensive multiprocessor system composed of two identical host processors. Our work shows that heterogeneity can be cost-effectively exploited and represents one step toward effectively mapping code on heterogeneous intelligent memory systems.