IEEE Transactions on Computers
Computer
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The multiscalar architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Proceedings of the 28th annual international symposium on Microarchitecture
Evaluation of design alternatives for a multiprocessor microprocessor
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Cost-Effective Parallel Computing
Computer
Exploiting optical interconnects to eliminate serial bottlenecks
MPPOI '96 Proceedings of the 3rd Conference on Massively Parallel Processing Using Optical Interconnections
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Speculative execution model with duplication
ICS '98 Proceedings of the 12th international conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Slipstream processors: improving both performance and fault tolerance
ACM SIGPLAN Notices
A study of slipstream processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Limited Bandwidth to Affect Processor Design
IEEE Micro
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
VLSI Architecture: Past, Present, and Future
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Memory-side prefetching for linked data structures for processor-in-memory systems
Journal of Parallel and Distributed Computing
Instruction Replication for Reducing Delays Due to Inter-PE Communication Latency
IEEE Transactions on Computers
Architectural contesting: exposing and exploiting temperamental behavior
ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Hi-index | 0.00 |
DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. The program data set (and/or text) is distributed across these memories. In this execution model, each processor broadcasts operands it loads from its local memory to all other units. In this paper, we describe the benefits, costs, and problems associated with the DataScalar model. We also present simulation results of one possible implementation of a DataScalar system. In our simulated implementation, six unmodified SPEC95 binaries ran from 7% slower to 50% faster on two nodes, and from 9% to 100% faster on four nodes, than on a system with a comparable, more traditional memory system. Our intuition and results show that DataScalar architectures work best with codes for which traditional parallelization techniques fail. We conclude with a discussion of how DataScalar systems may accommodate traditional parallel processing, thus improving performance over a much wider range applications than is currently possible with either model.