Principles of CMOS VLSI design: a systems perspective
Principles of CMOS VLSI design: a systems perspective
An investigation of the performance of various dynamic scheduling techniques
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The performance impact of incomplete bypassing in processor pipelines
Proceedings of the 28th annual international symposium on Microarchitecture
The PowerPC 604 RISC microprocessor
IEEE Micro
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Register File Design Considerations in Dynamically Scheduled Processors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
A comparison of data prefetching on an access decoupled and superscalar machine
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Exploiting idle floating-point resources for integer execution
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Speculative multithreaded processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
ICS '98 Proceedings of the 12th international conference on Supercomputing
Vector architectures: past, present and future
ICS '98 Proceedings of the 12th international conference on Supercomputing
The energy complexity of register files
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
An empirical study of decentralized ILP execution models
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Decoupling local variable accesses in a wide-issue superscalar processor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A scalable front-end architecture for fast instruction delivery
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Adding a vector unit to a superscalar processor
ICS '99 Proceedings of the 13th international conference on Supercomputing
Increasing effective IPC by exploiting distant parallelism
ICS '99 Proceedings of the 13th international conference on Supercomputing
Clustered speculative multithreaded processors
ICS '99 Proceedings of the 13th international conference on Supercomputing
A comparison of scalable superscalar processors
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Instruction fetch mechanisms for multipath execution processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Access region locality for high-bandwidth processor memory system design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Delaying physical register allocation through virtual-physical registers
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning
International Journal of Parallel Programming
Push vs. pull: data movement for linked data structures
Proceedings of the 14th international conference on Supercomputing
Proceedings of the 14th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Circuits for wide-window superscalar processors
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
Optimization of high-performance superscalar architectures for energy efficiency
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
ACM Transactions on Computer Systems (TOCS)
On pipelining dynamic instruction scheduling logic
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling for a fully-distributed clustered VLIW architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing wire delay penalty through value prediction
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Performance improvement with circuit-level speculation
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
IEEE Transactions on Computers
Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications
IEEE Transactions on Computers
A circuit level implementation of an adaptive issue queue for power-aware microprocessors
GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Inherently Lower-Power High-Performance Superscalar Architectures
IEEE Transactions on Computers
Optimizations Enabled by a Decoupled Front-End Architecture
IEEE Transactions on Computers
Reducing the complexity of the issue logic
ICS '01 Proceedings of the 15th international conference on Supercomputing
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor
ICS '01 Proceedings of the 15th international conference on Supercomputing
Dynamically allocating processor resources between nearby and distant ILP
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Focusing processor policies via critical-path prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Instruction scheduling for clustered VLIW architectures
ISSS '00 Proceedings of the 13th international symposium on System synthesis
A High-Bandwidth Memory Pipeline for Wide Issue Processors
IEEE Transactions on Computers
Improving Latency Tolerance of Multithreading through Decoupling
IEEE Transactions on Computers
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
IEEE Transactions on Parallel and Distributed Systems
Errata on "Measuring Experimental Error in Microprocessor Simulation"
ACM SIGARCH Computer Architecture News
An interleaved cache clustered VLIW processor
ICS '02 Proceedings of the 16th international conference on Supercomputing
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Efficient dynamic scheduling through tag elimination
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Slack: maximizing performance under technological constraints
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A scalable instruction queue design using dependence chains
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Application domains for fixed-length block structured architectures
ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
Multithreading decoupled architectures for complexity-effective general purpose computing
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Trident: a scalable architecture for scalar, vector, and matrix operations
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing power with dynamic critical path information
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Select-free instruction scheduling logic
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A high-speed dynamic instruction scheduling scheme for superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Tradeoffs in power-efficient issue queue design
Proceedings of the 2002 international symposium on Low power electronics and design
Energy-efficient hybrid wakeup logic
Proceedings of the 2002 international symposium on Low power electronics and design
Asymmetric-frequency clustering: a power-aware back-end for high-performance processors
Proceedings of the 2002 international symposium on Low power electronics and design
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Dynamic Code Partitioning for Clustered Architectures
International Journal of Parallel Programming
Typing the ISA to cluster the processor
Future Generation Computer Systems - Parallel computing technologies (PaCT-2001)
Instruction Level Distributed Processing
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Hierarchical Interconnects for On-Chip Clustering
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Comparison of Two Architectural Power Models
PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
TEM2P2EST: A Thermal Enabled Multi-model Power/Performance ESTimator
PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
An Adaptive Issue Queue for Reduced Power at High Performance
PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers
Typing the ISA to Cluster the Processor
PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
Efficient Interconnects for Clustered Microarchitectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Instruction Level Distributed Processing: Adapting to Future Technology
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
The Case for Speculative Multithreading on SMT Processors
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Speculative Clustered Caches for Clustered Processors
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Instruction Wake-Up in Wide Issue Superscalars
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Performance of the Complex Streamed Instruction Set on Image Processing Kernels
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Performance Scalability of Multimedia Instruction Set Extensions
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Decoupling Recovery Mechanism for Data Speculation from Dynamic Instruction Scheduling Structure
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Characterizing and predicting value degree of use
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Hierarchical Scheduling Windows
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reducing register ports for higher speed and lower energy
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power protocol: reducing power dissipation on off-chip data buses
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Exploiting data-width locality to increase superscalar execution bandwidth
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power-aware issue queue design for speculative instructions
Proceedings of the 40th annual Design Automation Conference
Dynamic binary translation for accumulator-oriented architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Improving quasi-dynamic schedules through region slip
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Partitioned first-level cache design for clustered microarchitectures
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Exploring Microprocessor Architectures for Gigascale Integration
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Impact of Power Density Limitation in Gigascale Integration for the SIMD Pixel Processor
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Power-Aware Control Speculation through Selective Throttling
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Deterministic Clock Gating for Microprocessor Power Reduction
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Front-End Policies for Improved Issue Efficiency in SMT Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Cheap Out-of-Order Execution Using Delayed Issue
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
A Power Perspective of Value Speculation for Superscalar Microprocessors
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Synthesis Experiments and Performance Metrics for Evaluating the Quality of IP Blocks and Megacells
ISQED '00 Proceedings of the 1st International Symposium on Quality of Electronic Design
Proceedings of the 30th annual international symposium on Computer architecture
Implicitly-multithreaded processors
Proceedings of the 30th annual international symposium on Computer architecture
Banked multiported register files for high-frequency superscalar microprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Cyclone: a broadcast-free dynamic instruction scheduler with selective replay
Proceedings of the 30th annual international symposium on Computer architecture
Improving dynamic cluster assignment for clustered trace cache processors
Proceedings of the 30th annual international symposium on Computer architecture
Dynamically managing the communication-parallelism trade-off in future clustered processors
Proceedings of the 30th annual international symposium on Computer architecture
Checkpointing alternatives for high performance, power-aware processors
Proceedings of the 2003 international symposium on Low power electronics and design
A mixed-clock issue queue design for globally asynchronous, locally synchronous processor cores
Proceedings of the 2003 international symposium on Low power electronics and design
A Clustered Approach to Multithreaded Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Macro-op Scheduling: Relaxing Scheduling Loop Constraints
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Design and Implementation of High-Performance Memory Systems for Future Packet Buffers
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Energy-efficient issue queue design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Modeling technology impact on cluster microprocessor performance
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Constructive timing violation for improving energy efficiency
Compilers and operating systems for low power
Implementation of a streaming execution unit
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
Processor-memory coexploration using an architecture description language
ACM Transactions on Embedded Computing Systems (TECS)
Scaling the issue window with look-ahead latency prediction
Proceedings of the 18th annual international conference on Supercomputing
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures
Proceedings of the 18th annual international conference on Supercomputing
Wire Delay is Not a Problem for SMT (In the Near Future)
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
Proceedings of the 31st annual international symposium on Computer architecture
A low-power in-order/out-of-order issue queue
ACM Transactions on Architecture and Code Optimization (TACO)
Application adaptive energy efficient clustered architectures
Proceedings of the 2004 international symposium on Low power electronics and design
Mixed-clock issue queue design for energy aware, high-performance cores
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Power-performance trade-off using pipeline delays
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Late Allocation and Early Release of Physical Registers
IEEE Transactions on Computers
A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
A scalable, clustered SMT processor for digital signal processing
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Area and System Clock Effects on SMT/CMP Throughput
IEEE Transactions on Computers
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Effects of speculation on performance and issue queue design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Toward kilo-instruction processors
ACM Transactions on Architecture and Code Optimization (TACO)
An analysis of a resource efficient checkpoint architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Inherently Workload-Balanced Clustered Microarchitecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Control-Flow Independence Reuse via Dynamic Vectorization
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A Dependency Chain Clustered Microarchitecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Technology-based Architectural Analysis of Operand Bypass Networks for Efficient Operand Transport
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Cache organizations for clustered microarchitectures
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Increasing design space of the instruction queue with tag coding
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors
IEEE Transactions on Computers
A Speculative Control Scheme for an Energy-Efficient Banked Register File
IEEE Transactions on Computers
Balancing clustering-induced stalls to improve performance in clustered processors
Proceedings of the 2nd conference on Computing frontiers
A time-predictable execution mode for superscalar pipelines with instruction prescheduling
Proceedings of the 2nd conference on Computing frontiers
An efficient wakeup design for energy reduction in high-performance superscalar processors
Proceedings of the 2nd conference on Computing frontiers
The CSI multimedia architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Execution cache-based microarchitecture power-efficient superscalar processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Variability and energy awareness: a microarchitecture-level perspective
Proceedings of the 42nd annual Design Automation Conference
Static strands: safely collapsing dependence chains for increasing embedded power efficiency
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Increased Scalability and Power Efficiency by Using Multiple Speed Pipelines
Proceedings of the 32nd annual international symposium on Computer Architecture
Instruction packing: reducing power and delay of the dynamic scheduling logic
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Understanding the energy efficiency of SMT and CMP with multiclustering
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Distributed Data Cache Designs for Clustered VLIW Processors
IEEE Transactions on Computers
Fast branch misprediction recovery in out-of-order superscalar processors
Proceedings of the 19th annual international conference on Supercomputing
Tornado warning: the perils of selective replay in multithreaded processors
Proceedings of the 19th annual international conference on Supercomputing
An asymmetric clustered processor based on value content
Proceedings of the 19th annual international conference on Supercomputing
Low-power, low-complexity instruction issue using compiler assistance
Proceedings of the 19th annual international conference on Supercomputing
Thread-Level Speculation on a CMP can be energy efficient
Proceedings of the 19th annual international conference on Supercomputing
Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors
IEEE Transactions on Parallel and Distributed Systems
Reducing the Latency and Area Cost of Core Swapping through Shared Helper Engines
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Power-Efficient Wakeup Tag Broadcast
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Instruction Replication for Reducing Delays Due to Inter-PE Communication Latency
IEEE Transactions on Computers
A Criticality Analysis of Clustering in Superscalar Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Balancing Resource Utilization to Mitigate Power Density in Processor Pipelines
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative execution for hiding memory latency
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Hardware-modulated parallelism in chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors
IEEE Transactions on Computers
An automated design flow for 3D microarchitecture evaluation
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Microarchitecture evaluation with floorplanning and interconnect pipelining
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures
IEEE Transactions on Computers
Instruction packing: Toward fast and energy-efficient instruction scheduling
ACM Transactions on Architecture and Code Optimization (TACO)
Design space exploration for 3D architectures
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Adaptive reorder buffers for SMT processors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
SEED: scalable, efficient enforcement of dependences
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A case for a complexity-effective, width-partitioned microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Energy-efficient dynamic instruction scheduling logic through instruction grouping
Proceedings of the 2006 international symposium on Low power electronics and design
Design and evaluation of a hierarchical decoupled architecture
The Journal of Supercomputing
Supporting microthread scheduling and synchronisation in CMPs
International Journal of Parallel Programming
BranchTap: improving performance with very few checkpoints through adaptive speculation control
Proceedings of the 20th annual international conference on Supercomputing
Scientific applications vs. SPEC-FP: a comparison of program behavior
Proceedings of the 20th annual international conference on Supercomputing
A scalable low power issue queue for large instruction window processors
Proceedings of the 20th annual international conference on Supercomputing
Design space exploration for multicore architectures: a power/performance/thermal view
Proceedings of the 20th annual international conference on Supercomputing
Exploiting Operand Availability for Efficient Simultaneous Multithreading
IEEE Transactions on Computers
Register port complexity reduction in wide-issue processors with selective instruction execution
Microprocessors & Microsystems
Unified microprocessor core storage
Proceedings of the 4th international conference on Computing frontiers
By-passing the out-of-order execution pipeline to increase energy-efficiency
Proceedings of the 4th international conference on Computing frontiers
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Proceedings of the 34th annual international symposium on Computer architecture
Inter-cluster communication in VLIW architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications
IEEE Transactions on Parallel and Distributed Systems
Static strands: Safely exposing dependence chains for increasing embedded power efficiency
ACM Transactions on Embedded Computing Systems (TECS) - Special Section LCTES'05
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors
Proceedings of the 21st annual international conference on Supercomputing
Architectural contesting: exposing and exploiting temperamental behavior
ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Scalable Dynamic Instruction Scheduler through Wake-Up Spatial Locality
IEEE Transactions on Computers
Building a large instruction window through ROB compression
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Hiding the misprediction penalty of a resource-efficient high-performance processor
ACM Transactions on Architecture and Code Optimization (TACO)
Hardware support for early register release
International Journal of High Performance Computing and Networking
International Journal of High Performance Computing and Networking
Low power microarchitecture with instruction reuse
Proceedings of the 5th conference on Computing frontiers
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Achieving Out-of-Order Performance with Almost In-Order Complexity
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Power-efficient clustering via incomplete bypassing
Proceedings of the 13th international symposium on Low power electronics and design
Investigating the effects of fine-grain three-dimensional integration on microarchitecture design
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Streamlining long latency instructions for seamlessly combined out-of-order and in-order execution
Microprocessors & Microsystems
A low-complexity microprocessor design with speculative pre-execution
Journal of Systems Architecture: the EUROMICRO Journal
Three-dimensional Integrated Circuit Design
Three-dimensional Integrated Circuit Design
A multithreading embedded architecture
DNCOCO'08 Proceedings of the 7th conference on Data networks, communications, computers
HeDGE: Hybrid Dataflow Graph Execution in the Issue Logic
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Strategies for mapping dataflow blocks to distributed hardware
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Shapeshifter: Dynamically changing pipeline width and speed to address process variations
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Issue Mechanism for Embedded Simultaneous Multithreading Processor
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Proceedings of the 6th ACM conference on Computing frontiers
Accurate Instruction Pre-scheduling in Dynamically Scheduled Processors
Transactions on High-Performance Embedded Architectures and Compilers II
Complexity Effective Bypass Networks
Transactions on High-Performance Embedded Architectures and Compilers II
An energy-efficient instruction scheduler design with two-level shelving and adaptive banking
Journal of Computer Science and Technology
Proceedings of the 36th annual international symposium on Computer architecture
Access region cache with register guided memory reference partitioning
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Reducing branch misprediction penalties via adaptive pipeline scaling
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Decoupled state-execute architecture
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Out-of-order issue logic using sorting networks
Proceedings of the 20th symposium on Great lakes symposium on VLSI
WiDGET: Wisconsin decoupled grid execution tiles
Proceedings of the 37th annual international symposium on Computer architecture
Forwardflow: a scalable core for power-constrained CMPs
Proceedings of the 37th annual international symposium on Computer architecture
Trifecta: a nonspeculative scheme to exploit common, data-dependent subcritical paths
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
VAIL: variation-aware issue logic and performance binning for processor yield and profit improvement
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Task superscalar: using processors as functional units
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
A taxonomy of accelerator architectures and their programming models
IBM Journal of Research and Development
Comparing FPGA vs. custom cmos and the impact on processor microarchitecture
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Efficient synchronization for embedded on-chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Wake-up logic optimizations through selective match and wakeup range limitation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy-efficient dynamic instruction scheduling logic through instruction grouping
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DCG: deterministic clock-gating for low-power microprocessor design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2002 international symposium on low-power electronics and design (ISLPED)
CROB: implementing a large instruction window through compression
Transactions on high-performance embedded architectures and compilers III
Performance evaluation of superscalar processor with multi-bank register file using SPEC2000
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Proceedings of the 38th annual international symposium on Computer architecture
CRIB: consolidated rename, issue, and bypass
Proceedings of the 38th annual international symposium on Computer architecture
SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion
Proceedings of the 8th ACM International Conference on Computing Frontiers
SIFT: a low-overhead dynamic information flow tracking architecture for SMT processors
Proceedings of the 8th ACM International Conference on Computing Frontiers
Architecting processors to allow voltage/reliability tradeoffs
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Stochastic computing: embracing errors in architectureand design of processors and applications
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Functional unit chaining: a runtime adaptive architecture for reducing bypass delays
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
A low-complexity issue queue design with speculative pre-execution
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Low power microprocessor design for embedded systems
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
Design and analysis of adaptive processor
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Design and effectiveness of small-sized decoupled dispatch queues
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Non-uniform instruction scheduling
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Instruction recirculation: eliminating counting logic in wakeup-free schedulers
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Single FU bypass networks for high clock rate superscalar processors
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
DDM-CMP: data-driven multithreading on a chip multiprocessor
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
CRAM: coded registers for amplified multiporting
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Global register alias table: Boosting sequential program on multi-core
Future Generation Computer Systems
An optimized front-end physical register file with banking and writeback filtering
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Compiler directed issue queue energy reduction
Transactions on High-Performance Embedded Architectures and Compilers IV
Design principles for synthesizable processor cores
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Proceedings of the 26th ACM international conference on Supercomputing
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Vector Extensions for Decision Support DBMS Acceleration
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Exploiting Timing Error Resilience in Processor Architecture
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Low complexity out-of-order issue logic using static circuits
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An out-of-order superscalar processor on FPGA: the ReOrder buffer design
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
MLP-aware dynamic instruction window resizing for adaptively exploiting both ILP and MLP
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The sharing architecture: sub-core configurability for IaaS clouds
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.05 |
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of 0.8µm, 0.35µm, and 0.18µm. Performance results and trends are expressed in terms of issue width and window size. Our analysis indicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future.A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of dependent instructions into queues, and issues instructions from multiple queues in parallel. Simulation shows little slowdown as compared with a completely flexible issue window when performance is measured in clock cycles. Furthermore, because only instructions at queue heads need to be awakened and selected, issue logic is simplified and the clock cycle is faster --- consequently overall performance is improved. By grouping dependent instructions together, the proposed microarchitecture will help minimize performance degradation due to slow bypasses in future wide-issue machines.