Memory requirements for balanced computer architectures
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Architecture of a message-driven processor
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Architectural tradeoffs in parallel computer design
Proceedings of the decennial Caltech conference on VLSI on Advanced research in VLSI
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The design of the Caltech Mosaic C multicomputer
Proceedings of the 1993 symposium on Research on integrated systems
A family of routing and communication chips based on the Mosaic
Proceedings of the 1993 symposium on Research on integrated systems
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
LoPC: modeling contention in parallel algorithms
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
LoGPC: Modeling Network Contention in Message-Passing Programs
IEEE Transactions on Parallel and Distributed Systems
Mips-X RISC Microprocessor
ICCD '92 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
A generic system simulator with novel on-chip cache and throughput models for gigascale integration
A generic system simulator with novel on-chip cache and throughput models for gigascale integration
Logic emulation with virtual wires
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Parametric timing and power macromodels for high level simulation of low-swing interconnects
Proceedings of the 2002 international symposium on Low power electronics and design
Opportunities and challenges in application-tuned circuits and architectures based on nanodevices
Proceedings of the 1st conference on Computing frontiers
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Design and analysis of an NoC architecture from performance, reliability and energy perspective
Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks
Proceedings of the 33rd annual international symposium on Computer Architecture
A domain-specific approach for software development on Manycore platforms
ACM SIGARCH Computer Architecture News
Manycore performance analysis using timed configuration graphs
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Proposal of an analytical solution for the load imbalance problem in parallel systems
ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Hi-index | 0.00 |
The semiconductor industry roadmap projects that advances in VLSI technology will permit more than one billion transistors on a chip by the year 2010. The MIT Raw microprocessor is a proposed architecture that strives to exploit these chip-level resources by implementing thousands of tiles, each comprising a processing element and a small amount of memory, coupled by a static two-dimensional interconnect. A compiler partitions fine-grain instruction-level parallelism across the tiles and statically schedules intertile communication over the interconnect. Because Raw microprocessors fully expose their internal hardware structure to the software, they can be viewed as a gigantic FPGA with coarse-grained tiles in which software orchestrates communication over static interconnections. One open challenge in Raw architectures is to determine their optimal grain size and balance. The grain size is the area of each tile and the balance is the proportion of area in each tile devoted to memory, processing, communication, and off-chip global I/O. If the total chip area is fixed, higher processing power per tile requires large tiles and hence reduces the total number of tiles on the chip. This paper presents SimpleFit, a novel analytical framework that designers can use to reason about the design space of Raw microprocessors. Our model is also generalizable to multiprocessors on a chip. Based on an architectural model, an application model, and a VLSI cost analysis, the framework computes the performance of applications and uses an optimization process to identify designs that will execute these applications most cost-effectively. Although the optimal machine configurations obtained vary for different applications, problem sizes, and budgets, the general trends for various applications are similar. Accordingly, for the applications studied, assuming a onr billion logic transistor equivalent area, we recommend building a Raw chip with approximately 1,000 tiles, 30 words/cycle global I/O, 20 Kbytes of local memory per tile, three to four words/cycle local communication bandwidth, and single-issue processors. This configuration will give performance near the global optimum for most applications.