Fast deterministic consensus in a noisy environment
Journal of Algorithms
Reliability Mechanisms for Very Large Storage Systems
MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Guerrilla Tactics: Motivating Design Patterns for High-Dependability Applications
SEW '02 Proceedings of the 27th Annual NASA Goddard Software Engineering Workshop (SEW-27'02)
MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences
Proceedings of the 30th annual international symposium on Computer architecture
Variable Instruction Set Architecture and Its Compiler Support
IEEE Transactions on Computers
Language support for lightweight transactions
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
A general purpose adaptivity driver for FE software
Software—Practice & Experience
Aliasing and anti-aliasing in branch history table prediction
ACM SIGARCH Computer Architecture News
Compositional Memory Systems for Data Intensive Applications
Proceedings of the conference on Design, automation and test in Europe - Volume 1
A first glance at Kilo-instruction based multiprocessors
Proceedings of the 1st conference on Computing frontiers
Reducing traffic generated by conflict misses in caches
Proceedings of the 1st conference on Computing frontiers
Approximating the optimal replacement algorithm
Proceedings of the 1st conference on Computing frontiers
Execution characteristics of SPEC CPU2000 benchmarks: Intel C++ vs. Microsoft VC++
ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite
ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Two parallel implementations for one dimension FFT on symmetric multiprocessors
ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Power-aware branch prediction techniques: a compiler-hints based approach for VLIW processors
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
The design of a low power asynchronous multiplier
Proceedings of the 2004 international symposium on Low power electronics and design
A VLIW low power Java processor for embedded applications
SBCCI '04 Proceedings of the 17th symposium on Integrated circuits and system design
Communications of the ACM - Voting systems
Design for Timing Predictability
Real-Time Systems
Minos: Control Data Attack Prevention Orthogonal to Memory Model
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Segment-based proxy caching for distributed cooperative media content servers
ACM SIGOPS Operating Systems Review
Compositional Memory Systems for Multimedia Communicating Tasks
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks
ISQED '05 Proceedings of the 6th International Symposium on Quality of Electronic Design
Power and Energy Profiling of Scientific Applications on Distributed Systems
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Resource Allocation for Periodic Applications in a Shipboard Environment
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 1 - Volume 02
Improving Energy-Efficiency by Bypassing Trivial Computations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
Improvement of Power-Performance Efficiency for High-End Computing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 11 - Volume 12
A case for multi-level main memory
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
A security assessment of the minos architecture
ACM SIGARCH Computer Architecture News - Special issue: Workshop on architectural support for security and anti-virus (WASSA)
An in-depth look at computer performance growth
ACM SIGARCH Computer Architecture News - Special issue: Workshop on architectural support for security and anti-virus (WASSA)
GAARP: A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks
IEEE Transactions on Computers
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach
Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
Motion estimation performance of the TM3270 processor
Proceedings of the 2005 ACM symposium on Applied computing
Shared memory multiprocessor support for functional array processing in SAC
Journal of Functional Programming
Exploiting Structural Duplication for Lifetime Reliability Enhancement
Proceedings of the 32nd annual international symposium on Computer Architecture
Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures
IEEE Transactions on Computers
Performance sensitivity of SPEC CPU2000 over operating frequency
ISICT '04 Proceedings of the 2004 international symposium on Information and communication technologies
Snug set-associative caches: reducing leakage power while improving performance
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
A Simple Project for Teaching Instruction Set Architecture
ICALT '05 Proceedings of the Fifth IEEE International Conference on Advanced Learning Technologies
Novel architecture for loop acceleration: a case study
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Segment protection for embedded systems using run-time checks
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
MTSS: multi task stack sharing for embedded systems
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Power complexity of multiplexer-based optoelectronic crossbar switches
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Online performance analysis by statistical sampling of microprocessor performance counters
Proceedings of the 19th annual international conference on Supercomputing
Automating tactile graphics translation
Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility
Towards a cross-platform microbenchmark suite for evaluating hardware performance counter data
Proceedings of the 2005 conference on Diversity in computing
Queue - Multiprocessors
Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Temperature-Dependent Optimization of Cache Leakage Power Dissipation
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Predicting the Performance of a 3D Processor-Memory Chip Stack
IEEE Design & Test
Instruction-level test methodology for CPU core self-testing
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Principles of Timing Anomalies in Superscalar Processors
QSIC '05 Proceedings of the Fifth International Conference on Quality Software
Performance characteristics of MAUI: an intelligent memory system architecture
Proceedings of the 2005 workshop on Memory system performance
Power-performance considerations of parallel computing on chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
High-level optimization of pipeline design
HLDVT '03 Proceedings of the Eighth IEEE International Workshop on High-Level Design Validation and Test Workshop
Lazy BTB: reduce BTB energy consumption using dynamic profiling
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Dynamic thread assignment on heterogeneous multiprocessor architectures
Proceedings of the 3rd conference on Computing frontiers
Static cache partitioning robustness analysis for embedded on-chip multi-processors
Proceedings of the 3rd conference on Computing frontiers
Compositional, efficient caches for a chip multi-processor
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Power/performance hardware optimization for synchronization intensive applications in MPSoCs
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Functional test generation using property decompositions for validation of pipelined processors
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Memory latency consideration for load sharing on heterogeneous network of workstations
Journal of Systems Architecture: the EUROMICRO Journal
A Self Test Program Design Technique for Embedded DSP Cores
Journal of Electronic Testing: Theory and Applications
Proceedings of the 41st annual Design Automation Conference
Speculative virtual verification: policy-constrained speculative execution
NSPW '05 Proceedings of the 2005 workshop on New security paradigms
Making a case for split data caches for embedded applications
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
An efficient synchronization technique for multiprocessor systems on-chip
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Design space exploration for 3D architectures
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Two-level mapping based cache index selection for packet forwarding engines
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Power reduction of multiple disks using dynamic cache resizing and speed control
Proceedings of the 2006 international symposium on Low power electronics and design
An experimental evaluation of a distributed Java compiler
Proceedings of the 43rd annual Southeast regional conference - Volume 2
Heterogeneous multiprocessor implementations for JPEG:: a case study
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools
Proceedings of the 20th annual international conference on Supercomputing
Implementing virtual memory in a vector processor with software restart markers
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
Minos: Architectural support for protecting control data
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Real-time rendering systems in 2010
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
SLAS: An efficient approach to scaling round-robin striped volumes
ACM Transactions on Storage (TOS)
Concurrent programming without locks
ACM Transactions on Computer Systems (TOCS)
Performance/area efficiency in chip multiprocessors with micro-caches
Proceedings of the 4th international conference on Computing frontiers
Speculative trivialization point advancing in high-performance processors
Journal of Systems Architecture: the EUROMICRO Journal
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Pipelined Execution of Critical Sections Using Software-Controlled Caching in Network Processors
Proceedings of the International Symposium on Code Generation and Optimization
Improving Instruction Set Architecture learning results
WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
Integrating research and e-learning in advanced computer architecture courses
WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
DARC2: 2nd generation DLX architecture simulator
WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
An embedded systems course and course sequence
WCAE '05 Proceedings of the 2005 workshop on Computer architecture education: held in conjunction with the 32nd International Symposium on Computer Architecture
WCAE '06 Proceedings of the 2006 workshop on Computer architecture education: held in conjunction with the 33rd International Symposium on Computer Architecture
WCAE '06 Proceedings of the 2006 workshop on Computer architecture education: held in conjunction with the 33rd International Symposium on Computer Architecture
A log buffer-based flash translation layer using fully-associative sector translation
ACM Transactions on Embedded Computing Systems (TECS)
How branch mispredictions affect quicksort
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Design methodology for pipelined heterogeneous multiprocessor system
Proceedings of the 44th annual Design Automation Conference
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Vector processing as an enabler for software-defined radio in handheld devices
EURASIP Journal on Applied Signal Processing
Automated tactile graphics translation: in the field
Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility
Intelligent selection of application-specific garbage collectors
Proceedings of the 6th international symposium on Memory management
Fault Tolerant Interleaved Switching Fabrics For Scalable High-Performance Routers
IEEE Transactions on Parallel and Distributed Systems
Neural, Parallel & Scientific Computations
Software—Practice & Experience
Amdahl's law revisited for single chip systems
International Journal of Parallel Programming
Cell broadband engine processor vault security architecture
IBM Journal of Research and Development
Applied Numerical Mathematics
Journal of Parallel and Distributed Computing
Adaptive prefetching algorithm in disk controllers
Performance Evaluation
Maurer computers for pipelined instruction processing†
Mathematical Structures in Computer Science
NBTI resilient circuits using adaptive body biasing
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Low power microarchitecture with instruction reuse
Proceedings of the 5th conference on Computing frontiers
Specification-driven directed test generation for validation of pipelined processors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Tiny split data-caches make big performance impact for embedded applications
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Configurable data memory for multimedia processing
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Architectural exploration of heterogeneous multiprocessor systems for JPEG
International Journal of Parallel Programming - Special Issue on Multiprocessor-based embedded systems
A reconfigurable FTL (flash translation layer) architecture for NAND flash-based applications
ACM Transactions on Embedded Computing Systems (TECS)
Journal of Parallel and Distributed Computing
A highly efficient implementation of a backpropagation learning algorithm using matrix ISA
Journal of Parallel and Distributed Computing
Early detection and bypassing of trivial operations to improve energy efficiency of processors
Microprocessors & Microsystems
High-performance computing of 1/√xi and exp(±xi) for a vector of inputs xi on Alpha and IA-64 CPUs
Journal of Systems Architecture: the EUROMICRO Journal
A quantitative analysis of the .NET common language runtime
Journal of Systems Architecture: the EUROMICRO Journal
Cache aware mapping of streaming applications on a multiprocessor system-on-chip
Proceedings of the conference on Design, automation and test in Europe
Testing diagnostics of modern microprocessors with the use of functional models
Automation and Remote Control
A Non-blocking Multithreaded Architecture with Support for Speculative Threads
ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
An Application of Constraint Programming to Superblock Instruction Scheduling
CP '08 Proceedings of the 14th international conference on Principles and Practice of Constraint Programming
Using EventB to Create a Virtual Machine Instruction Set Architecture
ABZ '08 Proceedings of the 1st international conference on Abstract State Machines, B and Z
Resource conflict detection in simulation of function unit pipelines
Journal of Systems Architecture: the EUROMICRO Journal
Predictable programming on a precision timed architecture
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Leveraging on-chip networks for data cache migration in chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
FPGA Architecture: Survey and Challenges
Foundations and Trends in Electronic Design Automation
SpotCore: a power-efficient embedded processor for intelligent sensor networks
Proceedings of the ICST 2nd international conference on Body area networks
IP Routing table compaction and sampling schemes to enhance TCAM cache performance
Journal of Systems Architecture: the EUROMICRO Journal
Learning heuristics for basic block instruction scheduling
Journal of Heuristics
Embedded DSP Processor Design: Application Specific Instruction Set Processors
Embedded DSP Processor Design: Application Specific Instruction Set Processors
Processor Description Languages
Processor Description Languages
Solving dense linear systems on platforms with multiple hardware accelerators
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Static Cache Partitioning Robustness Analysis for Embedded On-Chip Multi-processors
Transactions on High-Performance Embedded Architectures and Compilers I
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Modelling and verification of superscalar Micro-architectures functional approach
ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
A DSP-enhanced 32-bit embedded microprocessor
Journal of Embedded Computing - Selected papers of EUC 2005
Mathematical Structures in Computer Science
Issue Mechanism for Embedded Simultaneous Multithreading Processor
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Cache Controller Design on Ultra Low Leakage Embedded Processors
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Functional test generation using design and property decomposition techniques
ACM Transactions on Embedded Computing Systems (TECS)
Automatic constraint based test generation for behavioral HDL models
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design of an interconnect architecture and signaling technology for parallelism in communication
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Memory hierarchy exploration for accelerating the parallel computation of SVDs
Neural, Parallel & Scientific Computations
Instruction-Level Fault Tolerance Configurability
Journal of Signal Processing Systems
An Interactive Approach to Timing Accurate PCI-X Simulation
RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
Configurable emulated shared memory architecture for general purpose MP-SOCs and NOC regions
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
Programmable and Scalable Architecture for Graphics Processing Units
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Compositional, Dynamic Cache Management for Embedded Chip Multiprocessors
Journal of Signal Processing Systems
The Design and Evaluation of a Selective Way Based Trace Cache
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Architecture Design for Soft Errors
Architecture Design for Soft Errors
Some resources for teaching concurrency
Proceedings of the 7th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging
A decentralised task mapping approach for homogeneous multiprocessor network-on-chips
International Journal of Reconfigurable Computing - Selected papers from ReCoSoc08
81.6 GOPS object recognition processor based on a memory-centric NoC
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A configurable MIPS simulator for teaching computer architecture
CATE '07 Proceedings of the 10th IASTED International Conference on Computers and Advanced Technology in Education
Automating the generation of composed linear algebra kernels
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Runlength-based processing methods for low bit-depth images
IEEE Transactions on Image Processing
Cache replacement policies for IP address lookups
CSS '07 Proceedings of the Fifth IASTED International Conference on Circuits, Signals and Systems
Evaluating multicore algorithms on the unified memory model
Scientific Programming - Software Development for Multi-core Computing Systems
Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Design Space Exploration for an ASIP/Co-Processor Architecture used in GNSS Receivers
Journal of Signal Processing Systems
A novel compaction scheme for routing tables in TCAM to enhance cache hit rate
CIIT '07 The Sixth IASTED International Conference on Communications, Internet, and Information Technology
A Novel instruction stream buffer for VLIW architectures
Computers and Electrical Engineering
Finding representative workloads for computer system design
Finding representative workloads for computer system design
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach
Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
Entropy representation of memory access characteristics and cache performance
ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
Queuing theoretic model for a multiprocessor with private caches and shared memory
ACM SIGARCH Computer Architecture News
Hardware implementation of strategies for servicing queues
CompSysTech '09 Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing
The worst page-replacement policy
FUN'07 Proceedings of the 4th international conference on Fun with algorithms
A configurable multi-ported register file architecture for soft processor cores
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
High level performance metrics for FPGA-based multiprocessor systems
Performance Evaluation
Resource conflict detection in simulation of function unit pipelines
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Evaluating a low-power dual-core architecture
APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies
An algorithm to improve parallelism in distributed systems using asynchronous calls
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Parallel multiprocessor approaches to the RNA folding problem
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
Enigma: architectural and operating system support for reducing the impact of address translation
Proceedings of the 24th ACM International Conference on Supercomputing
Power-aware BTB for modern processors
Computers and Electrical Engineering
Applied inference: Case studies in microarchitectural design
ACM Transactions on Architecture and Code Optimization (TACO)
A multi-streaming SIMD multimedia computing engine
Microprocessors & Microsystems
On the operating unit size of load/store architectures†
Mathematical Structures in Computer Science
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A cooperative sensing based spectrum broker for dynamic spectrum access
MILCOM'09 Proceedings of the 28th IEEE conference on Military communications
Efficient decision ordering techniques for SAT-based test generation
Proceedings of the Conference on Design, Automation and Test in Europe
MB-LITE: a robust, light-weight soft-core implementation of the MicroBlaze architecture
Proceedings of the Conference on Design, Automation and Test in Europe
Journal of Signal Processing Systems
Aircraft integration real-time simulator modeling with AADL for architecture tradeoffs
Proceedings of the Conference on Design, Automation and Test in Europe
Fine-grain dynamic instruction placement for L0 scratch-pad memory
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Register allocation with instruction scheduling for VLIW-architectures
Programming and Computing Software
Dynamically reconfigurable cache architecture using adaptive block allocation policy
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploiting locality: a flexible DSM approach
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
Exploitation of multicore systems in a java virtual machine
IBM Journal of Research and Development
A pattern language for parallelizing irregular algorithms
Proceedings of the 2010 Workshop on Parallel Programming Patterns
International Journal of Intelligent Systems Technologies and Applications
FastScale: accelerate RAID scaling by minimizing data migration
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Quantitative analysis and optimization techniques for on-chip cache leakage power
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Efficient synchronization for embedded on-chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Should we worry about memory loss?
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Atropos: a disk array volume manager for orchestrated use of disks
FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
A dynamic workload balancing technique of a text matching algorithm on a cluster
TELE-INFO'06 Proceedings of the 5th WSEAS international conference on Telecommunications and informatics
An extended proof-carrying code framework for security enforcement
Transactions on computational science XI
ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Branch target buffers: WCET analysis framework and timing predictability
Journal of Systems Architecture: the EUROMICRO Journal
Reducing Network-on-Chip energy consumption through spatial locality speculation
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Column-selection-enabled 8T SRAM array with ~1R/1W multi-port operation for DVFS-enabled processors
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Operating two InfiniBand grid clusters over 28 km distance
International Journal of Grid and Utility Computing
The Journal of Supercomputing
Understanding prediction limits through unbiased branches
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Implementing cryptographic pairings on smartcards
CHES'06 Proceedings of the 8th international conference on Cryptographic Hardware and Embedded Systems
A DSP-Enhanced 32-bit embedded microprocessor
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Program parallelization using synchronized pipelining
LOPSTR'09 Proceedings of the 19th international conference on Logic-Based Program Synthesis and Transformation
A low-power DSP-enhanced 32-bit EISC processor
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Towards cache-optimized multigrid using patch-adaptive relaxation
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
RAPANUI: rapid prototyping for media processor architecture exploration
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Power-aware branch logic: a hardware based technique for filtering access to branch logic
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
PSnAP: accurate synthetic address streams through memory profiles
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Efficient scratchpad allocation algorithms for energy constrained embedded systems
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Architectural enhancements for color image and video processing on embedded systems
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Arithmetic data value speculation
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Application-Specific hardware-driven prefetching to improve data cache performance
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Defining and measuring performance characteristics of current video games
MMB&DFT'10 Proceedings of the 15th international GI/ITG conference on Measurement, Modelling, and Evaluation of Computing Systems and Dependability and Fault Tolerance
A vector approach to cryptography implementation
DRMTICS'05 Proceedings of the First international conference on Digital Rights Management: technologies, Issues, Challenges and Systems
SRAM CP: a charge recycling design schema for SRAM
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
The design of a dataflow coprocessor for low power embedded hierarchical processing
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Compile-Time energy optimization for parallel applications in on-chip multiprocessors
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Cache based power analysis attacks on AES
ACISP'06 Proceedings of the 11th Australasian conference on Information Security and Privacy
Predicting secret keys via branch prediction
CT-RSA'07 Proceedings of the 7th Cryptographers' track at the RSA conference on Topics in Cryptology
Assertion-Based verification for the SpaceCAKE multiprocessor – a case study
HVC'05 Proceedings of the First Haifa international conference on Hardware and Software Verification and Testing
Dynamic Cache Reconfiguration for Soft Real-Time Systems
ACM Transactions on Embedded Computing Systems (TECS)
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Parallel iterative compilation: using MapReduce to speedup machine learning in compilers
Proceedings of third international workshop on MapReduce and its Applications Date
Distributed approximate spectral clustering for large-scale datasets
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Decoding of Raptor codes on embedded systems
Microprocessors & Microsystems
A novel AES-256 implementation on FPGA using co-processor based architecture
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
INVISIOS: A Lightweight, Minimally Intrusive Secure Execution Environment
ACM Transactions on Embedded Computing Systems (TECS)
Probabilistic resource allocation in heterogeneous distributed systems with random failures
Journal of Parallel and Distributed Computing
Memory Latency Hiding by Load Value Speculation for Reconfigurable Computers
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Reducing NBTI-induced processor wearout by exploiting the timing slack of instructions
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Coherent out-of-core point-based global illumination
EGSR'11 Proceedings of the Twenty-second Eurographics conference on Rendering
Case study of multithreaded in-core isosurface extraction algorithms
EG PGV'04 Proceedings of the 5th Eurographics conference on Parallel Graphics and Visualization
Improving communication latency with the write-only architecture
Journal of Parallel and Distributed Computing
API compilation for image hardware accelerators
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Analytical modeling for multi-transaction bus on distributed systems
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Interactive visualization for memory reference traces
EuroVis'08 Proceedings of the 10th Joint Eurographics / IEEE - VGTC conference on Visualization
A Resource-Aware Dynamic Load-Balancing Parallelization Algorithm in a Farmer-Worker Environment
International Journal of Adaptive, Resilient and Autonomic Systems
Fast Likelihood Computation in Speech Recognition using Matrices
Journal of Signal Processing Systems
Step-by-step design and simulation of a simple CPU architecture
Proceeding of the 44th ACM technical symposium on Computer science education
Ten Years of Building Broken Chips: The Physics and Engineering of Inexact Computing
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
On the Simulation of HCI-Induced Variations of IC Timings at High Level
Journal of Electronic Testing: Theory and Applications
Trace construction using enhanced performance monitoring
Proceedings of the ACM International Conference on Computing Frontiers
Scaling energy per operation via an asynchronous pipeline
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design and Evaluation of a New Approach to RAID-0 Scaling
ACM Transactions on Storage (TOS)
Automated generation of directed tests for transition coverage in cache coherence protocols
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
NBTI mitigation by optimized NOP assignment and insertion
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
A new routing scheme for Jellyfish and its performance with HPC workloads
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Journal of Computer and System Sciences
On the Impact of Performance Faults in Modern Microprocessors
Journal of Electronic Testing: Theory and Applications
Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
The roles of mathematics in computer science
ACM Inroads
VLSI Design - Special issue on Advanced VLSI Design Methodologies for Emerging Industrial Multimedia and Communication Applications
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Analysis of dependence tracking algorithms for task dataflow execution
ACM Transactions on Architecture and Code Optimization (TACO)
EVA: an efficient vision architecture for mobile systems
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Optimizing image processing on multi-core CPUs with Intel parallel programming technologies
Multimedia Tools and Applications
On the Behaviours Produced by Instruction Sequences under Execution
Fundamenta Informaticae
Hi-index | 0.01 |
From the Book:I am very lucky to have studied computer architecture under Prof. David Patterson at U.C. Berkeley more than 20 years ago. I enjoyed the courses I took from him, in the early days of RISC architecture. Since leaving Berkeley to help found Sun Microsystems, I have used the ideas from his courses and many more that are described in this important book. The good news today is that this book covers incredibly important and contemporary material. The further good news is that much exciting and challenging work remains to be done, and that working from Computer Architecture: A Quantitative Approach is a great way to start. The most successful architectural projects that I have been involved in have always started from simple ideas, with advantages explainable using simple numerical models derived from hunches and rules of thumb. The continuing rapid advances in computing technology and new applications ensure that we will need new similarly simple models to understand what is possible in the future, and that new classes of applications will stress systems in different and interesting ways. The quantitative approach introduced in Chapter 1 is essential to understanding these issues. In particular, we expect to see, in the near future, much more emphasis on minimizing power to meet the demands of a given application, across all sizes of systems; much remains to be learned in this area. I have worked with many different instruction sets in my career. I first programmed a PDP-8, whose instruction set was so simple that a friend easily learned to disassemble programs just by glancing at the hole punches in paper tape! I wrote a lot of code in PDP-11 assembler, including an interpreter for the Pascal programming language and for the VAX (which was used as an example in the first edition of this book); the success of the VAX led to the widespread use of UNIX on the early Internet. The PDP-11 and VAX were very conventional complex instruction set (CISC) computer architectures, with relatively compact instruction sets that proved nearly impossible to pipeline. For a number of years in public talks I used the performance of the VAX 11/780 as the baseline; its speed was extremely well known because faster implementations of the architecture were so long delayed. VAX performance stalled out just as the x86 and 680x0 CISC architectures were appearing in microprocessors; the strong economic advantages of microprocessors led to their overwhelming dominance. Then the simpler reduced instruction set (RISC) computer architectures聴pioneered by John Cocke at IBM; promoted and named by Patterson and Hennessy; and commercialized in POWER PC, MIPS, and SPARC聴were implemented as microprocessors and permitted highperformance pipeline implementations through the use of their simple registeroriented instruction sets. A downside of RISC was the larger code size of programs and resulting greater instruction fetch bandwidth, a cost that could be seen to be acceptable using the techniques of Chapter 1 and by believing in the future CMOS technology trends promoted in the now-classic views of Carver Mead. The kind of clear-thinking approach to the present problems and to the shape of future computing advances that led to RISC architecture is the focus of this book. Chapter 2 (and various appendices) presents interesting examples of contemporary and important historical instruction set architecture. RISC architecture聴the focus of so much work in the last twenty years聴is by no means the final word here. I worked on the design of the SPARC architecture and several implementations for a decade, but more recently have worked on two different styles of processor: picoJava, which implemented most of the Java Virtual Machine instructions聴a compact, high-level, bytecoded instruction set聴and MAJC, a very simple and multithreaded VLIW for Java and media-intensive applications. These two architectures addressed different and new market needs: for lowpower chips to run embedded devices where space and power are at a premium, and for high performance for a given amount of power and cost where parallel applications are possible. While neither has achieved widespread commercial success, I expect that the future will see many opportunities for different ISAs, and an in-depth knowledge of history here often gives great guidance聴the relationships between key factors, such as the program size, execution speed, and power consumption, returning to previous balances that led to great designs in the past. Chapters 3 and 4 describe instruction-level parallelism (ILP): the ability to execute more than one instruction at a time. This has been aided greatly, in the last 20 years, by techniques such as RISC and VLIW (very long instruction word) computing. But as later chapters here point out, both RISC and especially VLIW as practiced in the Intel itanium architecture are very power intensive. In our attempts to extract more instruction-level parallelism, we are running up against the fact that the complexity of a design that attempts to execute N instructions simultaneously grows like N2: the number of transistors and number of watts to produce each result increases dramatically as we attempt to execute many instructions of arbitrary programs simultaneously. There is thus a clear countertrend emerging: using simpler pipelines with more realistic levels of ILP while exploiting other kinds of parallelism by running both multiple threads of execution per processor and, often, multiple processors on a single chip. The challenge for designers of high-performance systems of the future is to understand when simultaneous execution is possible, but then to use these techniques judiciously in combination with other, less granular techniques that are less power intensive and complex. In graduate school I would often joke that cache memories were the only great idea in computer science. But truly, where you put things affects profoundly the design of computer systems. Chapter 5 describes the classical design of cache and main memory hierarchies and virtual memory. And now, new, higher-level programming languages like Java support much more reliable software because they insist on the use of garbage collection and array bounds checking, so that security breaches from "buffer overflow" and insidious bugs from false sharing of memory do not creep into large programs. It is only languages, such as Java, that insist on the use of automatic storage management that can implement true software components. But garbage collectors are notoriously hard on memory hierarchies, and the design of systems and language implementations to work well for such areas is an active area of research, where much good work has been done but much exciting work remains. Java also strongly supports thread-level parallelism聴a key to simple, powerefficient, and high-performance system implementations that avoids the N2 problem discussed earlier but brings challenges of its own. A good foundational understanding of these issues can be had in Chapter 6. Traditionally, each processor was a separate chip, and keeping the various processors synchronized was expensive, both because of its impact on the memory hierarchy and because the synchronization operations themselves were very expensive. The Java language is also trying to address these issues: we tried, in the Java Language Specification, which I coauthored, to write a description of the memory model implied by the language. While this description turned out to have (fixable) technical problems, it is increasingly clear that we need to think about the memory hierarchy in the design of languages that are intended to work well on the newer system platforms. We view the Java specification as a first step in much good work to be done in the future. As Chapter 7 describes, storage has evolved from being connected to individual computers to being a separate network resource. This is reminiscent of computer graphics, where graphics processing that was previously done in a host processor often became a separate function as the importance of graphics increased. All this is likely to change radically in the coming years聴massively parallel host processors are likely to be able to do graphics better than dedicated outboard graphics units, and new breakthroughs in storage technologies, such as memories made from molecular electronics and other atomic-level nanotechnologies, should greatly reduce both the cost of storage and the access time. The resulting dramatic decreases in storage cost and access time will strongly encourage the use of multiple copies of data stored on individual computing nodes, rather than shared over a network. The "wheel of reincarnation," familiar from graphics, will appear in storage. Chapter 8 provides a great foundational description of computer interconnects and networks. My model of these comes from Andy Bechtolsheim, another of the cofounders of Sun, who famously said, "Ethernet always wins."More modestly stated: given the need for a new networking interconnect, and despite its shortcomings, adapted versions of the Ethernet protocols seem to have met with overwhelming success in the marketplace. Why? Factors such as the simplicity and familiarity of the protocols are obvious, but quite possibly the most likely reason is that the people who are adapting Ethernet can get on with the job at hand rather than arguing about details that, in the end, aren聮t dispositive. This lesson can be generalized to apply to all the areas of computer architecture discussed in this book. One of the things I remember Dave Patterson saying many years ago is that for each new project you only get so many "cleverness beans." That is, you can be very clever in a few areas of your design, but if you try to be clever in all of them, the design will probably fail to achieve its goals聴or even fail to work or to be finished at all. The overriding lesson that I have learned in 20 plus years of working on these kinds of designs is that you must choose what is important and focus on that; true wisdom is to know what to leave out. A deep knowledge of what has gone before is key to this ability. And you must also choose your assumptions carefully. Many years ago I attended a conference in Hawaii (yes, it was a boondoggle, but read on) where Maurice Wilkes, the legendary computer architect, gave a speech. What he said, paraphrased in my memory, is that good research often consists of assuming something that seems untrue or unlikely today will become true and investigating the consequences of that assumption. And if the unlikely assumption indeed then becomes true in the world, you will have done timely and sometimes, then, even great research! So, for example, the research group at Xerox PARC assumed that everyone would have access to a personal computer with a graphics display connected to others by an internetwork and the ability to print inexpensively using Xerography. How true all this became, and how seminally important their work was! In our time, and in the field of computer architecture, I think there are a number of assumptions that will become true. Some are not controversial, such as that Moore聮s Law is likely to continue for another decade or so and that the complexity of large chip designs is reaching practical limits, often beyond the point of positive returns for additional complexity. More controversially, perhaps, molecular electronics is likely to greatly reduce the cost of storage and probably logic elements as well, optical interconnects will greatly increase the bandwidth and reduce the error rates of interconnects, software will continue to be unreliable because it is so difficult, and security will continue to be important because its absence is so debilitating. Taking advantage of the strong positive trends detailed in this book and using them to mitigate the negative ones will challenge the next generation of computer architects, to design a range of systems of many shapes and sizes. Computer architecture design problems are becoming more varied and interesting. Now is an exciting time to be starting out or reacquainting yourself with the latest in this field, and this book is the best place to start. See you in the chips!