Computer Architecture: A Quantitative Approach

Authors:
John L. Hennessy;David A. Patterson
Affiliations:
-;-
Venue:
Computer Architecture: A Quantitative Approach
Year:
2003

Citing 0
Cited 282

Fast deterministic consensus in a noisy environment

Journal of Algorithms
Reliability Mechanisms for Very Large Storage Systems

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Guerrilla Tactics: Motivating Design Patterns for High-Dependability Applications

SEW '02 Proceedings of the 27th Annual NASA Goddard Software Engineering Workshop (SEW-27'02)
MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences

Proceedings of the 30th annual international symposium on Computer architecture
Variable Instruction Set Architecture and Its Compiler Support

IEEE Transactions on Computers
Language support for lightweight transactions

OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
A general purpose adaptivity driver for FE software

Software—Practice & Experience
Aliasing and anti-aliasing in branch history table prediction

ACM SIGARCH Computer Architecture News
Compositional Memory Systems for Data Intensive Applications

Proceedings of the conference on Design, automation and test in Europe - Volume 1
A first glance at Kilo-instruction based multiprocessors

Proceedings of the 1st conference on Computing frontiers
Reducing traffic generated by conflict misses in caches

Proceedings of the 1st conference on Computing frontiers
Approximating the optimal replacement algorithm

Proceedings of the 1st conference on Computing frontiers
Execution characteristics of SPEC CPU2000 benchmarks: Intel C++ vs. Microsoft VC++

ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite

ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Two parallel implementations for one dimension FFT on symmetric multiprocessors

ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Power-aware branch prediction techniques: a compiler-hints based approach for VLIW processors

Proceedings of the 14th ACM Great Lakes symposium on VLSI
Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
The design of a low power asynchronous multiplier

Proceedings of the 2004 international symposium on Low power electronics and design
A VLIW low power Java processor for embedded applications

SBCCI '04 Proceedings of the 17th symposium on Integrated circuits and system design
A Multilevel Computing Architecture for Embedded Multimedia Applications

IEEE Micro
Latency lags bandwith

Communications of the ACM - Voting systems
Design for Timing Predictability

Real-Time Systems
Minos: Control Data Attack Prevention Orthogonal to Memory Model

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Segment-based proxy caching for distributed cooperative media content servers

ACM SIGOPS Operating Systems Review
Compositional Memory Systems for Multimedia Communicating Tasks

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks

ISQED '05 Proceedings of the 6th International Symposium on Quality of Electronic Design
Power and Energy Profiling of Scientific Applications on Distributed Systems

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Resource Allocation for Periodic Applications in a Shipboard Environment

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 1 - Volume 02
Improving Energy-Efficiency by Bypassing Trivial Computations

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
Improvement of Power-Performance Efficiency for High-End Computing

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
A case for multi-level main memory

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
A security assessment of the minos architecture

ACM SIGARCH Computer Architecture News - Special issue: Workshop on architectural support for security and anti-virus (WASSA)
An in-depth look at computer performance growth

ACM SIGARCH Computer Architecture News - Special issue: Workshop on architectural support for security and anti-virus (WASSA)
GAARP: A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks

IEEE Transactions on Computers
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach

Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
Motion estimation performance of the TM3270 processor

Proceedings of the 2005 ACM symposium on Applied computing
Shared memory multiprocessor support for functional array processing in SAC

Journal of Functional Programming
Exploiting Structural Duplication for Lifetime Reliability Enhancement

Proceedings of the 32nd annual international symposium on Computer Architecture
Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures

IEEE Transactions on Computers
Performance sensitivity of SPEC CPU2000 over operating frequency

ISICT '04 Proceedings of the 2004 international symposium on Information and communication technologies
Snug set-associative caches: reducing leakage power while improving performance

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
A Simple Project for Teaching Instruction Set Architecture

ICALT '05 Proceedings of the Fifth IEEE International Conference on Advanced Learning Technologies
Novel architecture for loop acceleration: a case study

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Segment protection for embedded systems using run-time checks

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
MTSS: multi task stack sharing for embedded systems

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Power complexity of multiplexer-based optoelectronic crossbar switches

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Online performance analysis by statistical sampling of microprocessor performance counters

Proceedings of the 19th annual international conference on Supercomputing
Automating tactile graphics translation

Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility
Towards a cross-platform microbenchmark suite for evaluating hardware performance counter data

Proceedings of the 2005 conference on Diversity in computing
The Future of Microprocessors

Queue - Multiprocessors
Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Temperature-Dependent Optimization of Cache Leakage Power Dissipation

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Improving data cache performance with integrated use of split caches, victim cache and stream buffers

MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Predicting the Performance of a 3D Processor-Memory Chip Stack

IEEE Design & Test
Instruction-level test methodology for CPU core self-testing

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Principles of Timing Anomalies in Superscalar Processors

QSIC '05 Proceedings of the Fifth International Conference on Quality Software
Performance characteristics of MAUI: an intelligent memory system architecture

Proceedings of the 2005 workshop on Memory system performance
Power-performance considerations of parallel computing on chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
High-level optimization of pipeline design

HLDVT '03 Proceedings of the Eighth IEEE International Workshop on High-Level Design Validation and Test Workshop
Lazy BTB: reduce BTB energy consumption using dynamic profiling

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Dynamic thread assignment on heterogeneous multiprocessor architectures

Proceedings of the 3rd conference on Computing frontiers
Static cache partitioning robustness analysis for embedded on-chip multi-processors

Proceedings of the 3rd conference on Computing frontiers
Compositional, efficient caches for a chip multi-processor

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Power/performance hardware optimization for synchronization intensive applications in MPSoCs

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Functional test generation using property decompositions for validation of pipelined processors

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Memory latency consideration for load sharing on heterogeneous network of workstations

Journal of Systems Architecture: the EUROMICRO Journal
A Self Test Program Design Technique for Embedded DSP Cores

Journal of Electronic Testing: Theory and Applications
An Adaptive Load Balancer for Multiprocessor Routers

Simulation
Architecture description language (ADL)-driven software toolkit generation for architectural exploration of programmable SOCs

Proceedings of the 41st annual Design Automation Conference
Speculative virtual verification: policy-constrained speculative execution

NSPW '05 Proceedings of the 2005 workshop on New security paradigms
Making a case for split data caches for embedded applications

MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
An efficient synchronization technique for multiprocessor systems on-chip

MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Design space exploration for 3D architectures

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Two-level mapping based cache index selection for packet forwarding engines

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Power reduction of multiple disks using dynamic cache resizing and speed control

Proceedings of the 2006 international symposium on Low power electronics and design
An experimental evaluation of a distributed Java compiler

Proceedings of the 43rd annual Southeast regional conference - Volume 2
Heterogeneous multiprocessor implementations for JPEG:: a case study

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools

Proceedings of the 20th annual international conference on Supercomputing
Implementing virtual memory in a vector processor with software restart markers

Proceedings of the 20th annual international conference on Supercomputing
An effective network processor design framework: using multi-objective evolutionary algorithms and object oriented techniques to optimise the intel IXP1200 network processor

Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
Minos: Architectural support for protecting control data

ACM Transactions on Architecture and Code Optimization (TACO)
In-Network Cache Coherence

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Real-time rendering systems in 2010

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Snug set-associative caches: Reducing leakage power of instruction and data caches with no performance penalties

ACM Transactions on Architecture and Code Optimization (TACO)
RIMAC: a novel redundancy-based hierarchical cache architecture for energy efficient, high performance storage systems

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
SLAS: An efficient approach to scaling round-robin striped volumes

ACM Transactions on Storage (TOS)
Concurrent programming without locks

ACM Transactions on Computer Systems (TOCS)
Performance/area efficiency in chip multiprocessors with micro-caches

Proceedings of the 4th international conference on Computing frontiers
Speculative trivialization point advancing in high-performance processors

Journal of Systems Architecture: the EUROMICRO Journal
Applying a constructivist and collaborative methodological approach in engineering education

Computers & Education
Optimistic parallelism requires abstractions

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Pipelined Execution of Critical Sections Using Software-Controlled Caching in Network Processors

Proceedings of the International Symposium on Code Generation and Optimization
Energy-Efficient Multiprocessor Systems-on-Chip for Embedded Computing: Exploring Programming Models and Their Architectural Support

IEEE Transactions on Computers
Improving Instruction Set Architecture learning results

WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
Integrating research and e-learning in advanced computer architecture courses

WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
DARC2: 2nd generation DLX architecture simulator

WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
An embedded systems course and course sequence

WCAE '05 Proceedings of the 2005 workshop on Computer architecture education: held in conjunction with the 32nd International Symposium on Computer Architecture
PSATSim: an interactive graphical superscalar architecture simulator for power and performance analysis

WCAE '06 Proceedings of the 2006 workshop on Computer architecture education: held in conjunction with the 33rd International Symposium on Computer Architecture
An execution-driven simulation tool for teaching cache memories in introductory computer organization courses

WCAE '06 Proceedings of the 2006 workshop on Computer architecture education: held in conjunction with the 33rd International Symposium on Computer Architecture
A log buffer-based flash translation layer using fully-associative sector translation

ACM Transactions on Embedded Computing Systems (TECS)
How branch mispredictions affect quicksort

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Design methodology for pipelined heterogeneous multiprocessor system

Proceedings of the 44th annual Design Automation Conference
A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Vector processing as an enabler for software-defined radio in handheld devices

EURASIP Journal on Applied Signal Processing
Automated tactile graphics translation: in the field

Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility
Intelligent selection of application-specific garbage collectors

Proceedings of the 6th international symposium on Memory management
Fault Tolerant Interleaved Switching Fabrics For Scalable High-Performance Routers

IEEE Transactions on Parallel and Distributed Systems
A highly efficient implementation of back propagation algorithm using matrix instruction set architecture

Neural, Parallel & Scientific Computations
When to use splay trees

Software—Practice & Experience
Amdahl's law revisited for single chip systems

International Journal of Parallel Programming
Cell broadband engine processor vault security architecture

IBM Journal of Research and Development
Cache efficient data structures and algorithms for adaptive multidimensional multilevel finite element solvers

Applied Numerical Mathematics
A hybrid Branch-and-Bound and evolutionary approach for allocating strings of applications to heterogeneous distributed computing systems

Journal of Parallel and Distributed Computing
Adaptive prefetching algorithm in disk controllers

Performance Evaluation
Maurer computers for pipelined instruction processing†

Mathematical Structures in Computer Science
NBTI resilient circuits using adaptive body biasing

Proceedings of the 18th ACM Great Lakes symposium on VLSI
Low power microarchitecture with instruction reuse

Proceedings of the 5th conference on Computing frontiers
Specification-driven directed test generation for validation of pipelined processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Tiny split data-caches make big performance impact for embedded applications

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Configurable data memory for multimedia processing

Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Architectural exploration of heterogeneous multiprocessor systems for JPEG

International Journal of Parallel Programming - Special Issue on Multiprocessor-based embedded systems
A reconfigurable FTL (flash translation layer) architecture for NAND flash-based applications

ACM Transactions on Embedded Computing Systems (TECS)
A pipelined-loop-compatible architecture and algorithm to reduce variable-length sets of floating-point data on a reconfigurable computer

Journal of Parallel and Distributed Computing
A highly efficient implementation of a backpropagation learning algorithm using matrix ISA

Journal of Parallel and Distributed Computing
Early detection and bypassing of trivial operations to improve energy efficiency of processors

Microprocessors & Microsystems
High-performance computing of 1/√xi and exp(±xi) for a vector of inputs xi on Alpha and IA-64 CPUs

Journal of Systems Architecture: the EUROMICRO Journal
A quantitative analysis of the .NET common language runtime

Journal of Systems Architecture: the EUROMICRO Journal
Cache aware mapping of streaming applications on a multiprocessor system-on-chip

Proceedings of the conference on Design, automation and test in Europe
Testing diagnostics of modern microprocessors with the use of functional models

Automation and Remote Control
A Non-blocking Multithreaded Architecture with Support for Speculative Threads

ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
An Application of Constraint Programming to Superblock Instruction Scheduling

CP '08 Proceedings of the 14th international conference on Principles and Practice of Constraint Programming
Using EventB to Create a Virtual Machine Instruction Set Architecture

ABZ '08 Proceedings of the 1st international conference on Abstract State Machines, B and Z
Resource conflict detection in simulation of function unit pipelines

Journal of Systems Architecture: the EUROMICRO Journal
Predictable programming on a precision timed architecture

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Core cannibalization architecture: improving lifetime chip performance for multicore processors in the presence of hard faults

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Leveraging on-chip networks for data cache migration in chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
FPGA Architecture: Survey and Challenges

Foundations and Trends in Electronic Design Automation
SpotCore: a power-efficient embedded processor for intelligent sensor networks

Proceedings of the ICST 2nd international conference on Body area networks
IP Routing table compaction and sampling schemes to enhance TCAM cache performance

Journal of Systems Architecture: the EUROMICRO Journal
Learning heuristics for basic block instruction scheduling

Journal of Heuristics
Embedded DSP Processor Design: Application Specific Instruction Set Processors

Embedded DSP Processor Design: Application Specific Instruction Set Processors
Processor Description Languages

Processor Description Languages
Solving dense linear systems on platforms with multiple hardware accelerators

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Static Cache Partitioning Robustness Analysis for Embedded On-Chip Multi-processors

Transactions on High-Performance Embedded Architectures and Compilers I
DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Modelling and verification of superscalar Micro-architectures functional approach

ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
A DSP-enhanced 32-bit embedded microprocessor

Journal of Embedded Computing - Selected papers of EUC 2005
On the interpretation of mathematical entities in the formalisation of programming and modelling languages

Mathematical Structures in Computer Science
Issue Mechanism for Embedded Simultaneous Multithreading Processor

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Cache Controller Design on Ultra Low Leakage Embedded Processors

ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Functional test generation using design and property decomposition techniques

ACM Transactions on Embedded Computing Systems (TECS)
Automatic constraint based test generation for behavioral HDL models

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design of an interconnect architecture and signaling technology for parallelism in communication

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Memory hierarchy exploration for accelerating the parallel computation of SVDs

Neural, Parallel & Scientific Computations
Instruction-Level Fault Tolerance Configurability

Journal of Signal Processing Systems
An Interactive Approach to Timing Accurate PCI-X Simulation

RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
Configurable emulated shared memory architecture for general purpose MP-SOCs and NOC regions

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Programmable and Scalable Architecture for Graphics Processing Units

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Compositional, Dynamic Cache Management for Embedded Chip Multiprocessors

Journal of Signal Processing Systems
The Design and Evaluation of a Selective Way Based Trace Cache

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Architecture Design for Soft Errors

Architecture Design for Soft Errors
Some resources for teaching concurrency

Proceedings of the 7th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging
A decentralised task mapping approach for homogeneous multiprocessor network-on-chips

International Journal of Reconfigurable Computing - Selected papers from ReCoSoc08
81.6 GOPS object recognition processor based on a memory-centric NoC

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A configurable MIPS simulator for teaching computer architecture

CATE '07 Proceedings of the 10th IASTED International Conference on Computers and Advanced Technology in Education
Automating the generation of composed linear algebra kernels

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Runlength-based processing methods for low bit-depth images

IEEE Transactions on Image Processing
Cache replacement policies for IP address lookups

CSS '07 Proceedings of the Fifth IASTED International Conference on Circuits, Signals and Systems
Evaluating multicore algorithms on the unified memory model

Scientific Programming - Software Development for Multi-core Computing Systems
Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Design Space Exploration for an ASIP/Co-Processor Architecture used in GNSS Receivers

Journal of Signal Processing Systems
A novel compaction scheme for routing tables in TCAM to enhance cache hit rate

CIIT '07 The Sixth IASTED International Conference on Communications, Internet, and Information Technology
A Novel instruction stream buffer for VLIW architectures

Computers and Electrical Engineering
Finding representative workloads for computer system design

Finding representative workloads for computer system design
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach

Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
Entropy representation of memory access characteristics and cache performance

ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
Queuing theoretic model for a multiprocessor with private caches and shared memory

ACM SIGARCH Computer Architecture News
Hardware implementation of strategies for servicing queues

CompSysTech '09 Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
The worst page-replacement policy

FUN'07 Proceedings of the 4th international conference on Fun with algorithms
A configurable multi-ported register file architecture for soft processor cores

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
High level performance metrics for FPGA-based multiprocessor systems

Performance Evaluation
Resource conflict detection in simulation of function unit pipelines

SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Evaluating a low-power dual-core architecture

APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies
An algorithm to improve parallelism in distributed systems using asynchronous calls

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Parallel multiprocessor approaches to the RNA folding problem

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Architectural implications of cache coherence protocols with network applications on chip multiprocessors

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
Enigma: architectural and operating system support for reducing the impact of address translation

Proceedings of the 24th ACM International Conference on Supercomputing
Power-aware BTB for modern processors

Computers and Electrical Engineering
Applied inference: Case studies in microarchitectural design

ACM Transactions on Architecture and Code Optimization (TACO)
A multi-streaming SIMD multimedia computing engine

Microprocessors & Microsystems
On the operating unit size of load/store architectures†

Mathematical Structures in Computer Science
WAYPOINT: scaling coherence to thousand-core architectures

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A cooperative sensing based spectrum broker for dynamic spectrum access

MILCOM'09 Proceedings of the 28th IEEE conference on Military communications
Efficient decision ordering techniques for SAT-based test generation

Proceedings of the Conference on Design, Automation and Test in Europe
MB-LITE: a robust, light-weight soft-core implementation of the MicroBlaze architecture

Proceedings of the Conference on Design, Automation and Test in Europe
Cross-layer speculative architecture for end systems and gateways in computer networks with lossy links

Wireless Networks
Improving the Performance of Hyperspectral Image and Signal Processing Algorithms Using Parallel, Distributed and Specialized Hardware-Based Systems

Journal of Signal Processing Systems
Aircraft integration real-time simulator modeling with AADL for architecture tradeoffs

Proceedings of the Conference on Design, Automation and Test in Europe
Fine-grain dynamic instruction placement for L0 scratch-pad memory

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Register allocation with instruction scheduling for VLIW-architectures

Programming and Computing Software
Dynamically reconfigurable cache architecture using adaptive block allocation policy

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploiting locality: a flexible DSM approach

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Redesigning the string hash table, burst trie, and BST to exploit cache

Journal of Experimental Algorithmics (JEA)
Exploitation of multicore systems in a java virtual machine

IBM Journal of Research and Development
A pattern language for parallelizing irregular algorithms

Proceedings of the 2010 Workshop on Parallel Programming Patterns
An embedded compression algorithm integrated with Motion JPEG2000 system for reduction of off-chip video memory bandwidth

International Journal of Intelligent Systems Technologies and Applications
FastScale: accelerate RAID scaling by minimizing data migration

FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Quantitative analysis and optimization techniques for on-chip cache leakage power

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Efficient synchronization for embedded on-chip multiprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Should we worry about memory loss?

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Atropos: a disk array volume manager for orchestrated use of disks

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
A dynamic workload balancing technique of a text matching algorithm on a cluster

TELE-INFO'06 Proceedings of the 5th WSEAS international conference on Telecommunications and informatics
An extended proof-carrying code framework for security enforcement

Transactions on computational science XI
Towards hardware acceleration of neuroevolution for multimedia processing applications on mobile devices

ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Branch target buffers: WCET analysis framework and timing predictability

Journal of Systems Architecture: the EUROMICRO Journal
Reducing Network-on-Chip energy consumption through spatial locality speculation

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Column-selection-enabled 8T SRAM array with ~1R/1W multi-port operation for DVFS-enabled processors

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Operating two InfiniBand grid clusters over 28 km distance

International Journal of Grid and Utility Computing
Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

The Journal of Supercomputing
Understanding prediction limits through unbiased branches

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Exploring the capacity of a modern SMT architecture to deliver high scientific application performance

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Implementing cryptographic pairings on smartcards

CHES'06 Proceedings of the 8th international conference on Cryptographic Hardware and Embedded Systems
Evaluation of state-of-the-art hardware architectures for fast cone-beam CT reconstruction

Parallel Computing
A DSP-Enhanced 32-bit embedded microprocessor

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Program parallelization using synchronized pipelining

LOPSTR'09 Proceedings of the 19th international conference on Logic-Based Program Synthesis and Transformation
A low-power DSP-enhanced 32-bit EISC processor

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Towards cache-optimized multigrid using patch-adaptive relaxation

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
RAPANUI: rapid prototyping for media processor architecture exploration

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Power-aware branch logic: a hardware based technique for filtering access to branch logic

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
PSnAP: accurate synthetic address streams through memory profiles

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Efficient scratchpad allocation algorithms for energy constrained embedded systems

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Architectural enhancements for color image and video processing on embedded systems

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Arithmetic data value speculation

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Application-Specific hardware-driven prefetching to improve data cache performance

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Defining and measuring performance characteristics of current video games

MMB&DFT'10 Proceedings of the 15th international GI/ITG conference on Measurement, Modelling, and Evaluation of Computing Systems and Dependability and Fault Tolerance
A vector approach to cryptography implementation

DRMTICS'05 Proceedings of the First international conference on Digital Rights Management: technologies, Issues, Challenges and Systems
SRAM CP: a charge recycling design schema for SRAM

PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
The design of a dataflow coprocessor for low power embedded hierarchical processing

PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Compile-Time energy optimization for parallel applications in on-chip multiprocessors

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Cache based power analysis attacks on AES

ACISP'06 Proceedings of the 11th Australasian conference on Information Security and Privacy
Predicting secret keys via branch prediction

CT-RSA'07 Proceedings of the 7th Cryptographers' track at the RSA conference on Topics in Cryptology
Assertion-Based verification for the SpaceCAKE multiprocessor – a case study

HVC'05 Proceedings of the First Haifa international conference on Hardware and Software Verification and Testing
Dynamic Cache Reconfiguration for Soft Real-Time Systems

ACM Transactions on Embedded Computing Systems (TECS)
Effective and efficient microprocessor design space exploration using unlabeled design configurations

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Parallel iterative compilation: using MapReduce to speedup machine learning in compilers

Proceedings of third international workshop on MapReduce and its Applications Date
Distributed approximate spectral clustering for large-scale datasets

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Decoding of Raptor codes on embedded systems

Microprocessors & Microsystems
A novel AES-256 implementation on FPGA using co-processor based architecture

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
INVISIOS: A Lightweight, Minimally Intrusive Secure Execution Environment

ACM Transactions on Embedded Computing Systems (TECS)
Probabilistic resource allocation in heterogeneous distributed systems with random failures

Journal of Parallel and Distributed Computing
Memory Latency Hiding by Load Value Speculation for Reconfigurable Computers

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Reducing NBTI-induced processor wearout by exploiting the timing slack of instructions

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Coherent out-of-core point-based global illumination

EGSR'11 Proceedings of the Twenty-second Eurographics conference on Rendering
Case study of multithreaded in-core isosurface extraction algorithms

EG PGV'04 Proceedings of the 5th Eurographics conference on Parallel Graphics and Visualization
Improving communication latency with the write-only architecture

Journal of Parallel and Distributed Computing
API compilation for image hardware accelerators

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Analytical modeling for multi-transaction bus on distributed systems

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Interactive visualization for memory reference traces

EuroVis'08 Proceedings of the 10th Joint Eurographics / IEEE - VGTC conference on Visualization
A Resource-Aware Dynamic Load-Balancing Parallelization Algorithm in a Farmer-Worker Environment

International Journal of Adaptive, Resilient and Autonomic Systems
Fast Likelihood Computation in Speech Recognition using Matrices

Journal of Signal Processing Systems
Step-by-step design and simulation of a simple CPU architecture

Proceeding of the 44th ACM technical symposium on Computer science education
Ten Years of Building Broken Chips: The Physics and Engineering of Inexact Computing

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
On the Simulation of HCI-Induced Variations of IC Timings at High Level

Journal of Electronic Testing: Theory and Applications
Trace construction using enhanced performance monitoring

Proceedings of the ACM International Conference on Computing Frontiers
Scaling energy per operation via an asynchronous pipeline

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design and Evaluation of a New Approach to RAID-0 Scaling

ACM Transactions on Storage (TOS)
Automated generation of directed tests for transition coverage in cache coherence protocols

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
NBTI mitigation by optimized NOP assignment and insertion

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
A new routing scheme for Jellyfish and its performance with HPC workloads

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

Journal of Computer and System Sciences
On the Impact of Performance Faults in Modern Microprocessors

Journal of Electronic Testing: Theory and Applications
Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
An algorithmic framework for solving large-scale multistage stochastic mixed 0-1 problems with nonsymmetric scenario trees. Part II: Parallelization

Computers and Operations Research
The roles of mathematics in computer science

ACM Inroads
Design example of useful memory latency for developing a hazard preventive pipeline high-performance embedded-microprocessor

VLSI Design - Special issue on Advanced VLSI Design Methodologies for Emerging Industrial Multimedia and Communication Applications
Effective and efficient microprocessor design space exploration using unlabeled design configurations

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Analysis of dependence tracking algorithms for task dataflow execution

ACM Transactions on Architecture and Code Optimization (TACO)
EVA: an efficient vision architecture for mobile systems

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Optimizing image processing on multi-core CPUs with Intel parallel programming technologies

Multimedia Tools and Applications
On the Behaviours Produced by Instruction Sequences under Execution

Fundamenta Informaticae

Quantified Score

Hi-index	0.01

Visualization

Abstract

From the Book:I am very lucky to have studied computer architecture under Prof. David Patterson at U.C. Berkeley more than 20 years ago. I enjoyed the courses I took from him, in the early days of RISC architecture. Since leaving Berkeley to help found Sun Microsystems, I have used the ideas from his courses and many more that are described in this important book. The good news today is that this book covers incredibly important and contemporary material. The further good news is that much exciting and challenging work remains to be done, and that working from Computer Architecture: A Quantitative Approach is a great way to start. The most successful architectural projects that I have been involved in have always started from simple ideas, with advantages explainable using simple numerical models derived from hunches and rules of thumb. The continuing rapid advances in computing technology and new applications ensure that we will need new similarly simple models to understand what is possible in the future, and that new classes of applications will stress systems in different and interesting ways. The quantitative approach introduced in Chapter 1 is essential to understanding these issues. In particular, we expect to see, in the near future, much more emphasis on minimizing power to meet the demands of a given application, across all sizes of systems; much remains to be learned in this area. I have worked with many different instruction sets in my career. I first programmed a PDP-8, whose instruction set was so simple that a friend easily learned to disassemble programs just by glancing at the hole punches in paper tape! I wrote a lot of code in PDP-11 assembler, including an interpreter for the Pascal programming language and for the VAX (which was used as an example in the first edition of this book); the success of the VAX led to the widespread use of UNIX on the early Internet. The PDP-11 and VAX were very conventional complex instruction set (CISC) computer architectures, with relatively compact instruction sets that proved nearly impossible to pipeline. For a number of years in public talks I used the performance of the VAX 11/780 as the baseline; its speed was extremely well known because faster implementations of the architecture were so long delayed. VAX performance stalled out just as the x86 and 680x0 CISC architectures were appearing in microprocessors; the strong economic advantages of microprocessors led to their overwhelming dominance. Then the simpler reduced instruction set (RISC) computer architectures聴pioneered by John Cocke at IBM; promoted and named by Patterson and Hennessy; and commercialized in POWER PC, MIPS, and SPARC聴were implemented as microprocessors and permitted highperformance pipeline implementations through the use of their simple registeroriented instruction sets. A downside of RISC was the larger code size of programs and resulting greater instruction fetch bandwidth, a cost that could be seen to be acceptable using the techniques of Chapter 1 and by believing in the future CMOS technology trends promoted in the now-classic views of Carver Mead. The kind of clear-thinking approach to the present problems and to the shape of future computing advances that led to RISC architecture is the focus of this book. Chapter 2 (and various appendices) presents interesting examples of contemporary and important historical instruction set architecture. RISC architecture聴the focus of so much work in the last twenty years聴is by no means the final word here. I worked on the design of the SPARC architecture and several implementations for a decade, but more recently have worked on two different styles of processor: picoJava, which implemented most of the Java Virtual Machine instructions聴a compact, high-level, bytecoded instruction set聴and MAJC, a very simple and multithreaded VLIW for Java and media-intensive applications. These two architectures addressed different and new market needs: for lowpower chips to run embedded devices where space and power are at a premium, and for high performance for a given amount of power and cost where parallel applications are possible. While neither has achieved widespread commercial success, I expect that the future will see many opportunities for different ISAs, and an in-depth knowledge of history here often gives great guidance聴the relationships between key factors, such as the program size, execution speed, and power consumption, returning to previous balances that led to great designs in the past. Chapters 3 and 4 describe instruction-level parallelism (ILP): the ability to execute more than one instruction at a time. This has been aided greatly, in the last 20 years, by techniques such as RISC and VLIW (very long instruction word) computing. But as later chapters here point out, both RISC and especially VLIW as practiced in the Intel itanium architecture are very power intensive. In our attempts to extract more instruction-level parallelism, we are running up against the fact that the complexity of a design that attempts to execute N instructions simultaneously grows like N2: the number of transistors and number of watts to produce each result increases dramatically as we attempt to execute many instructions of arbitrary programs simultaneously. There is thus a clear countertrend emerging: using simpler pipelines with more realistic levels of ILP while exploiting other kinds of parallelism by running both multiple threads of execution per processor and, often, multiple processors on a single chip. The challenge for designers of high-performance systems of the future is to understand when simultaneous execution is possible, but then to use these techniques judiciously in combination with other, less granular techniques that are less power intensive and complex. In graduate school I would often joke that cache memories were the only great idea in computer science. But truly, where you put things affects profoundly the design of computer systems. Chapter 5 describes the classical design of cache and main memory hierarchies and virtual memory. And now, new, higher-level programming languages like Java support much more reliable software because they insist on the use of garbage collection and array bounds checking, so that security breaches from "buffer overflow" and insidious bugs from false sharing of memory do not creep into large programs. It is only languages, such as Java, that insist on the use of automatic storage management that can implement true software components. But garbage collectors are notoriously hard on memory hierarchies, and the design of systems and language implementations to work well for such areas is an active area of research, where much good work has been done but much exciting work remains. Java also strongly supports thread-level parallelism聴a key to simple, powerefficient, and high-performance system implementations that avoids the N2 problem discussed earlier but brings challenges of its own. A good foundational understanding of these issues can be had in Chapter 6. Traditionally, each processor was a separate chip, and keeping the various processors synchronized was expensive, both because of its impact on the memory hierarchy and because the synchronization operations themselves were very expensive. The Java language is also trying to address these issues: we tried, in the Java Language Specification, which I coauthored, to write a description of the memory model implied by the language. While this description turned out to have (fixable) technical problems, it is increasingly clear that we need to think about the memory hierarchy in the design of languages that are intended to work well on the newer system platforms. We view the Java specification as a first step in much good work to be done in the future. As Chapter 7 describes, storage has evolved from being connected to individual computers to being a separate network resource. This is reminiscent of computer graphics, where graphics processing that was previously done in a host processor often became a separate function as the importance of graphics increased. All this is likely to change radically in the coming years聴massively parallel host processors are likely to be able to do graphics better than dedicated outboard graphics units, and new breakthroughs in storage technologies, such as memories made from molecular electronics and other atomic-level nanotechnologies, should greatly reduce both the cost of storage and the access time. The resulting dramatic decreases in storage cost and access time will strongly encourage the use of multiple copies of data stored on individual computing nodes, rather than shared over a network. The "wheel of reincarnation," familiar from graphics, will appear in storage. Chapter 8 provides a great foundational description of computer interconnects and networks. My model of these comes from Andy Bechtolsheim, another of the cofounders of Sun, who famously said, "Ethernet always wins."More modestly stated: given the need for a new networking interconnect, and despite its shortcomings, adapted versions of the Ethernet protocols seem to have met with overwhelming success in the marketplace. Why? Factors such as the simplicity and familiarity of the protocols are obvious, but quite possibly the most likely reason is that the people who are adapting Ethernet can get on with the job at hand rather than arguing about details that, in the end, aren聮t dispositive. This lesson can be generalized to apply to all the areas of computer architecture discussed in this book. One of the things I remember Dave Patterson saying many years ago is that for each new project you only get so many "cleverness beans." That is, you can be very clever in a few areas of your design, but if you try to be clever in all of them, the design will probably fail to achieve its goals聴or even fail to work or to be finished at all. The overriding lesson that I have learned in 20 plus years of working on these kinds of designs is that you must choose what is important and focus on that; true wisdom is to know what to leave out. A deep knowledge of what has gone before is key to this ability. And you must also choose your assumptions carefully. Many years ago I attended a conference in Hawaii (yes, it was a boondoggle, but read on) where Maurice Wilkes, the legendary computer architect, gave a speech. What he said, paraphrased in my memory, is that good research often consists of assuming something that seems untrue or unlikely today will become true and investigating the consequences of that assumption. And if the unlikely assumption indeed then becomes true in the world, you will have done timely and sometimes, then, even great research! So, for example, the research group at Xerox PARC assumed that everyone would have access to a personal computer with a graphics display connected to others by an internetwork and the ability to print inexpensively using Xerography. How true all this became, and how seminally important their work was! In our time, and in the field of computer architecture, I think there are a number of assumptions that will become true. Some are not controversial, such as that Moore聮s Law is likely to continue for another decade or so and that the complexity of large chip designs is reaching practical limits, often beyond the point of positive returns for additional complexity. More controversially, perhaps, molecular electronics is likely to greatly reduce the cost of storage and probably logic elements as well, optical interconnects will greatly increase the bandwidth and reduce the error rates of interconnects, software will continue to be unreliable because it is so difficult, and security will continue to be important because its absence is so debilitating. Taking advantage of the strong positive trends detailed in this book and using them to mitigate the negative ones will challenge the next generation of computer architects, to design a range of systems of many shapes and sizes. Computer architecture design problems are becoming more varied and interesting. Now is an exciting time to be starting out or reacquainting yourself with the latest in this field, and this book is the best place to start. See you in the chips!