Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Studies of Windows NT performance using dynamic execution traces
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
The energy efficiency of IRAM architectures
Proceedings of the 24th annual international symposium on Computer architecture
Proceedings of the 24th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
IEEE Micro
Vector microprocessors
Initial results on the performance and cost of vector microprocessors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
PAVLOV: a programmable architecture for volume processing
HWWS '98 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
An embedded DRAM architecture for large-scale spatial-lattice computations
Proceedings of the 27th annual international symposium on Computer architecture
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
IEEE Transactions on Computers
High-Performance DRAMs in Workstation Environments
IEEE Transactions on Computers
Automatic Code Mapping on an Intelligent Memory Architecture
IEEE Transactions on Computers
Using a user-level memory thread for correlation prefetching
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Memory management for embedded network applications
Readings in hardware/software co-design
Random-Access Data Storage Components in Customized Architectures
IEEE Design & Test
Coping with Latency in SOC Design
IEEE Micro
On the Structure of Concurrent Interpreters
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Memory System Support for Irregular Applications
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
High-level synthesis of distributed logic-memory architectures
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Hardware support for real-time embedded multiprocessor system-on-a-chip memory management
Proceedings of the tenth international symposium on Hardware/software codesign
Proceedings of the 40th annual Design Automation Conference
Programming the FlexRAM parallel intelligent memory system
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploring Microprocessor Architectures for Gigascale Integration
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Parallelizing Applications into Silicon
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
An Adder Using Charge Sharing and its Application in DRAMs
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Efficient Place and Route for Pipeline Reconfigurable Architectures
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
A pipelined memory architecture for high throughput network processors
Proceedings of the 30th annual international symposium on Computer architecture
Correlation Prefetching with a User-Level Memory Thread
IEEE Transactions on Parallel and Distributed Systems
The Reconfigurable Streaming Vector Processor (RSVPTM)
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Modeling technology impact on cluster microprocessor performance
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
An IRAM Architecture for Image Analysis and Pattern Recognition
ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
The Vector-Thread Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
The Vector-Thread Architecture
IEEE Micro
Memory-side prefetching for linked data structures for processor-in-memory systems
Journal of Parallel and Distributed Computing
Distributed Data Cache Designs for Clustered VLIW Processors
IEEE Transactions on Computers
High-level synthesis using computation-unit integrated memories
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Impulse: Memory system support for scientific applications
Scientific Programming
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors
Proceedings of the 21st annual international conference on Supercomputing
Future generation supercomputers I: a paradigm for node architecture
ACM SIGARCH Computer Architecture News - Special issue: ALPS '07---advanced low power systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
FT64: scientific computing with streams
HiPC'07 Proceedings of the 14th international conference on High performance computing
Task superscalar: using processors as functional units
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Tiled multi-core stream architecture
Transactions on High-Performance Embedded Architectures and Compilers IV
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
Hi-index | 4.11 |
T his article proposes a new architecture called "trace processors," which consist of multiple, distributed on-chip processor cores, each of which simultaneously executes a different trace. All but one core executes the traces speculatively, having used branch prediction to select traces that follow the one executing. (Although this architectural concept is similar to multiscalar processors, described in a sidebar, it does not require explicit compiler support). The authors argue that future processors will rely heavily on replication and hierarchy, and they show how their architecture exploits these concepts.