Interleaving: a multithreading technique targeting multiprocessors and workstations
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
ICS '90 Proceedings of the 4th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
IEEE Micro
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
A framework for modelling and analysis of software systems scalability
Proceedings of the 28th international conference on Software engineering
Architectural support for operating system-driven CMP cache management
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Rotary router: an efficient architecture for CMP interconnection networks
Proceedings of the 34th annual international symposium on Computer architecture
Design of adaptive multiprocessor on chip systems
Proceedings of the 20th annual conference on Integrated circuits and systems design
Status report: the manticore project
ML '07 Proceedings of the 2007 workshop on Workshop on ML
Cache-aware iteration space partitioning
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
The worst-case execution-time problem—overview of methods and survey of tools
ACM Transactions on Embedded Computing Systems (TECS)
Lee-TM: A Non-trivial Benchmark Suite for Transactional Memory
ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
Automated architecture synthesis for parallel programs on FPGA multiprocessor systems
Microprocessors & Microsystems
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers I
Communications of the ACM - Security in the Browser
Cache-aware partitioning of multi-dimensional iteration spaces
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
A scalable micro wireless interconnect structure for CMPs
Proceedings of the 15th annual international conference on Mobile computing and networking
Mapping stream programs onto heterogeneous multiprocessor systems
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
On-chip transactional memory system for FPGAs using TCC model
Proceedings of the 6th FPGAworld Conference
A parallel infrastructure on dynamic EPIC SMT
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Workload and network-optimized computing systems
IBM Journal of Research and Development
Journal of Parallel and Distributed Computing
An adaptive cache coherence protocol for chip multiprocessors
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Programming in Manticore, a heterogenous parallel functional language
CEFP'09 Proceedings of the Third summer school conference on Central European functional programming school
Robust adaptation to available parallelism in transactional memory applications
Transactions on high-performance embedded architectures and compilers III
Trebuchet: exploring TLP with dataflow virtualisation
International Journal of High Performance Systems Architecture
Dynamic instruction scheduling in a trace-based multi-threaded architecture
International Journal of Parallel Programming
Efficiently exploring compiler optimization sequences with pairwise pruning
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Automatic OpenCL device characterization: guiding optimized kernel design
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Performance and power aware CMP thread allocation modeling
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
SCF: A Framework for Task-Level Coordination in Reconfigurable, Heterogeneous Systems
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A transactional runtime system for the Cell/BE architecture
Journal of Parallel and Distributed Computing
Measuring interference between live datacenter applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On the Evolution of Hardware Circuits via Reconfigurable Architectures
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
CMP off-chip bandwidth scheduling guided by instruction criticality
Proceedings of the 27th international ACM conference on International conference on supercomputing
Improving performance of software transactional memory through contention locality
The Journal of Supercomputing
Directory based cache coherence verification logic in CMPs cache system
Proceedings of the First International Workshop on Many-core Embedded Systems
A shared matrix unit for a chip multi-core processor
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
The performance of microprocessors that power modern computers has continued to increase exponentially over the years for two main reasons. First, the transistors that are the heart of the circuits in all processors and memory chips have simply become faster over time on a course described by Moore’s law,1 and this directly affects the performance of processors built with those transistors. Moreover, actual processor performance has increased faster than Moore’s law would predict,2 because processor designers have been able to harness the increasing numbers of transistors available on modern chips to extract more parallelism from software. This is depicted in figure 1 for Intel’s processors.