Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
Thread-level parallelism and interactive performance of desktop applications
ACM SIGPLAN Notices
Execution-based prediction using speculative slices
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamically allocating processor resources between nearby and distant ILP
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Design and implementation of the POWER5™ microprocessor
Proceedings of the 41st annual Design Automation Conference
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism
Proceedings of the 31st annual international symposium on Computer architecture
Out-of-Order Commit Processors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A performance methodology for commercial servers
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
Store Memory-Level Parallelism Optimizations for Commercial Applications
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Performance/Watt: the new server focus
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Fast synchronization for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
An analytical model of locality-based parallel irregular reductions
Parallel Computing
Corona: System Implications of Emerging Nanophotonic Technology
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Proceedings of the 36th annual international symposium on Computer architecture
Performance impact of resource conflicts on chip multi-processor servers
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Transactional memories for multi-processor FPGA platforms
Journal of Systems Architecture: the EUROMICRO Journal
Performance limitations of block-multithreaded distributed-memory systems
Winter Simulation Conference
OUTRIDER: efficient memory latency tolerance with decoupled strands
Proceedings of the 38th annual international symposium on Computer architecture
Performance evaluation of a chip-multithreading server for high performance computing applications
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Throughput computing with chip multithreading and clusters
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Wimpy or brawny cores: A throughput perspective
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Throughput computing, achieved through multithreading and multicore technology, can lead to performance improvements that are 10 to 30脳 those of conventional processors and systems. However, such systems should also offer good single-thread performance. Here, the authors show that hardware scouting increases the performance of an already robust core by up to 40 percent for commercial benchmarks.