Compiler Support for Scalable and Efficient Memory Systems

Authors:
Rajeev Barua;Walter Lee;Saman Amarasinghe;Anant Agarawal
Affiliations:
Univ. of Maryland, College Park, MD;MIT Laboratory for Computer Science, Cambridge, MA;MIT Laboratory for Computer Science, Cambridge, MA;MIT, Cambridge, MA
Venue:
IEEE Transactions on Computers
Year:
2001

Citing 26
Cited 13

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Run-time disambiguation: coping with statically unpredictable dependencies

IEEE Transactions on Computers
Accurate analysis of array references

Accurate analysis of array references
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Data preload for superscalar and VLIW processors

Data preload for superscalar and VLIW processors
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Efficient support for irregular applications on distributed-memory machines

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A manual for the CHAOS runtime library

A manual for the CHAOS runtime library
Memory bank and register allocation in software synthesis for ASIPs

ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Exploiting dual data-memory banks in digital signal processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The design, implementation, and evaluation of Jade

ACM Transactions on Programming Languages and Systems (TOPLAS)
Pointer analysis for multithreaded programs

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Supporting systolic and memory communication in iWarp

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Rethinking Deep-Submicron Circuit Design

Computer
Dependence Analysis

Dependence Analysis
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Baring It All to Software: Raw Machines

Computer
The RAW benchmark suite: computation structures for general purpose computing

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Maps: a compiler-managed memory system for software-exposed architectures

Maps: a compiler-managed memory system for software-exposed architectures

Heterogeneous memory management for embedded systems

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Efficient register and memory assignment for non-orthogonal architectures via graph coloring and MST algorithms

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Fast memory bank assignment for fixed-point digital signal processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
High-level power analysis for on-chip networks

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
High-level synthesis using computation-unit integrated memories

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Generation of heterogeneous distributed architectures for memory-intensive applications through high-level synthesis

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Semi-sparse flow-sensitive pointer analysis

Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Temporal Partitioning to Amortize Reconfiguration Overhead for Dynamically Reconfigurable Architectures

IEICE - Transactions on Information and Systems
Using a configurable processor generator for computer architecture prototyping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Compiling for reconfigurable computing: A survey

ACM Computing Surveys (CSUR)
Exploiting both pipelining and data parallelism with SIMD reconfigurable architecture

ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications

Quantified Score

Hi-index	14.98

Visualization

Abstract

Technological trends require that future scalable microprocessors be decentralized. Applying these trends toward memory systems shows that the size of the cache accessible in a single cycle will decrease in a future generation of chips. Thus, a bank-exposed memory system comprised of small, decentralized cache banks must eventually replace that of a monolithic cache. This paper considers how to effectively use such a memory system for sequential programs. This paper presents Maps, the software technology central to bank-exposed architectures, which are architectures with bank-exposed memory systems. Maps solves the problem of bank disambiguation驴that of determining at compile-time which bank a memory reference is accessing. Bank disambiguation is important because it enables the compile-time optimization for data locality, where data can be placed close to the computation that requires it. Two methods for bank disambiguation are presented: equivalence-class unification and modulo unrolling. Experimental results are presented using a compiler for the MIT Raw machine, a bank-exposed architecture that relies on the compiler to 1) manage its memory and 2) orchestrate its instruction level parallelism and communication. Results on Raw using sequential codes demonstrate that using bank disambiguation improves performance by a factor of 3 to 5 over using ILP alone.