DataScalar architectures

Authors:
Doug Burger;Stefanos Kaxiras;James R. Goodman
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, Wisconsin;Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, Wisconsin;Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, Wisconsin
Venue:
Proceedings of the 24th annual international symposium on Computer architecture
Year:
1997

Citing 13
Cited 15

Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
Scalable coherent interface

Computer
Performance of the SCI ring

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The multiscalar architecture

The multiscalar architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The M-Machine multicomputer

Proceedings of the 28th annual international symposium on Microarchitecture
Evaluation of design alternatives for a multiprocessor microprocessor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Cost-Effective Parallel Computing

Computer
Exploiting optical interconnects to eliminate serial bottlenecks

MPPOI '96 Proceedings of the 3rd Conference on Massively Parallel Processing Using Optical Interconnections
Aspects of cache memory and instruction buffer performance

Aspects of cache memory and instruction buffer performance

Speculative execution model with duplication

ICS '98 Proceedings of the 12th international conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Slipstream processors: improving both performance and fault tolerance

ACM SIGPLAN Notices
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Scalable Processors in the Billion-Transistor Era: IRAM

Computer
Changing Interaction of Compiler and Architecture

Computer
Exploiting Instruction- and Data-Level Parallelism

IEEE Micro
Limited Bandwidth to Affect Processor Design

IEEE Micro
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
VLSI Architecture: Past, Present, and Future

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Memory-side prefetching for linked data structures for processor-in-memory systems

Journal of Parallel and Distributed Computing
Instruction Replication for Reducing Delays Due to Inter-PE Communication Latency

IEEE Transactions on Computers
Architectural contesting: exposing and exploiting temperamental behavior

ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. The program data set (and/or text) is distributed across these memories. In this execution model, each processor broadcasts operands it loads from its local memory to all other units. In this paper, we describe the benefits, costs, and problems associated with the DataScalar model. We also present simulation results of one possible implementation of a DataScalar system. In our simulated implementation, six unmodified SPEC95 binaries ran from 7% slower to 50% faster on two nodes, and from 9% to 100% faster on four nodes, than on a system with a comparable, more traditional memory system. Our intuition and results show that DataScalar architectures work best with codes for which traditional parallelization techniques fail. We conclude with a discussion of how DataScalar systems may accommodate traditional parallel processing, thus improving performance over a much wider range applications than is currently possible with either model.