Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Run-time disambiguation: coping with statically unpredictable dependencies
IEEE Transactions on Computers
Accurate analysis of array references
Accurate analysis of array references
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Data preload for superscalar and VLIW processors
Data preload for superscalar and VLIW processors
Dynamic memory disambiguation using the memory conflict buffer
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Efficient support for irregular applications on distributed-memory machines
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A manual for the CHAOS runtime library
A manual for the CHAOS runtime library
Memory bank and register allocation in software synthesis for ASIPs
ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Exploiting dual data-memory banks in digital signal processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The design, implementation, and evaluation of Jade
ACM Transactions on Programming Languages and Systems (TOPLAS)
Pointer analysis for multithreaded programs
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Supporting systolic and memory communication in iWarp
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Dependence Analysis
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
The RAW benchmark suite: computation structures for general purpose computing
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Maps: a compiler-managed memory system for software-exposed architectures
Maps: a compiler-managed memory system for software-exposed architectures
Heterogeneous memory management for embedded systems
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
An optimal memory allocation scheme for scratch-pad-based embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Fast memory bank assignment for fixed-point digital signal processors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
High-level power analysis for on-chip networks
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
High-level synthesis using computation-unit integrated memories
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Semi-sparse flow-sensitive pointer analysis
Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
IEICE - Transactions on Information and Systems
Using a configurable processor generator for computer architecture prototyping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Compiling for reconfigurable computing: A survey
ACM Computing Surveys (CSUR)
Exploiting both pipelining and data parallelism with SIMD reconfigurable architecture
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Hi-index | 14.98 |
Technological trends require that future scalable microprocessors be decentralized. Applying these trends toward memory systems shows that the size of the cache accessible in a single cycle will decrease in a future generation of chips. Thus, a bank-exposed memory system comprised of small, decentralized cache banks must eventually replace that of a monolithic cache. This paper considers how to effectively use such a memory system for sequential programs. This paper presents Maps, the software technology central to bank-exposed architectures, which are architectures with bank-exposed memory systems. Maps solves the problem of bank disambiguation驴that of determining at compile-time which bank a memory reference is accessing. Bank disambiguation is important because it enables the compile-time optimization for data locality, where data can be placed close to the computation that requires it. Two methods for bank disambiguation are presented: equivalence-class unification and modulo unrolling. Experimental results are presented using a compiler for the MIT Raw machine, a bank-exposed architecture that relies on the compiler to 1) manage its memory and 2) orchestrate its instruction level parallelism and communication. Results on Raw using sequential codes demonstrate that using bank disambiguation improves performance by a factor of 3 to 5 over using ILP alone.