The Future of Microprocessors

Authors:
Kunle Olukotun;Lance Hammond
Affiliations:
Stanford University;Stanford University
Venue:
Queue - Multiprocessors
Year:
2005

Citing 13
Cited 40

Interleaving: a multithreading technique targeting multiprocessors and workstations

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
The Stanford Hydra CMP

IEEE Micro
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Transactional Coherence and Consistency: Simplifying Parallel Hardware and Software

IEEE Micro
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro

A framework for modelling and analysis of software systems scalability

Proceedings of the 28th international conference on Software engineering
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Rotary router: an efficient architecture for CMP interconnection networks

Proceedings of the 34th annual international symposium on Computer architecture
Design of adaptive multiprocessor on chip systems

Proceedings of the 20th annual conference on Integrated circuits and systems design
Status report: the manticore project

ML '07 Proceedings of the 2007 workshop on Workshop on ML
Cache-aware iteration space partitioning

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
The worst-case execution-time problem—overview of methods and survey of tools

ACM Transactions on Embedded Computing Systems (TECS)
Lee-TM: A Non-trivial Benchmark Suite for Transactional Memory

ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
Automated architecture synthesis for parallel programs on FPGA multiprocessor systems

Microprocessors & Microsystems
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors

Transactions on High-Performance Embedded Architectures and Compilers I
Spending Moore's dividend

Communications of the ACM - Security in the Browser
Cache-aware partitioning of multi-dimensional iteration spaces

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
A scalable micro wireless interconnect structure for CMPs

Proceedings of the 15th annual international conference on Mobile computing and networking
Mapping stream programs onto heterogeneous multiprocessor systems

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
SCF: a device- and language-independent task coordination framework for reconfigurable, heterogeneous systems

Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
On-chip transactional memory system for FPGAs using TCC model

Proceedings of the 6th FPGAworld Conference
A parallel infrastructure on dynamic EPIC SMT

ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Architectural implications of cache coherence protocols with network applications on chip multiprocessors

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Workload and network-optimized computing systems

IBM Journal of Research and Development
Transactional memory

Journal of Parallel and Distributed Computing
Leveraging the power of multi-core platforms for large-scale geospatial data processing: Exemplified by generating DEM from massive LiDAR point clouds

Computers & Geosciences
An adaptive cache coherence protocol for chip multiprocessors

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Programming in Manticore, a heterogenous parallel functional language

CEFP'09 Proceedings of the Third summer school conference on Central European functional programming school
Robust adaptation to available parallelism in transactional memory applications

Transactions on high-performance embedded architectures and compilers III
Trebuchet: exploring TLP with dataflow virtualisation

International Journal of High Performance Systems Architecture
Dynamic instruction scheduling in a trace-based multi-threaded architecture

International Journal of Parallel Programming
Efficiently exploring compiler optimization sequences with pairwise pruning

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Automatic OpenCL device characterization: guiding optimized kernel design

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Buffer sizing for self-timed stream programs on heterogeneous distributed memory multiprocessors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Performance and power aware CMP thread allocation modeling

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
SCF: A Framework for Task-Level Coordination in Reconfigurable, Heterogeneous Systems

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A transactional runtime system for the Cell/BE architecture

Journal of Parallel and Distributed Computing
Measuring interference between live datacenter applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On the Evolution of Hardware Circuits via Reconfigurable Architectures

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
CMP off-chip bandwidth scheduling guided by instruction criticality

Proceedings of the 27th international ACM conference on International conference on supercomputing
Improving performance of software transactional memory through contention locality

The Journal of Supercomputing
Directory based cache coherence verification logic in CMPs cache system

Proceedings of the First International Workshop on Many-core Embedded Systems
A shared matrix unit for a chip multi-core processor

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of microprocessors that power modern computers has continued to increase exponentially over the years for two main reasons. First, the transistors that are the heart of the circuits in all processors and memory chips have simply become faster over time on a course described by Moore’s law,1 and this directly affects the performance of processors built with those transistors. Moreover, actual processor performance has increased faster than Moore’s law would predict,2 because processor designers have been able to harness the increasing numbers of transistors available on modern chips to extract more parallelism from software. This is depicted in figure 1 for Intel’s processors.