High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The design and verification of the AlphaStation 600 5-series workstation
Digital Technical Journal - Special 10th anniversary issue
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Evaluation of design alternatives for a multiprocessor microprocessor
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Compiler-directed page coloring for multiprocessors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Register File Design Considerations in Dynamically Scheduled Processors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Initial results on the performance and cost of vector microprocessors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ICS '98 Proceedings of the 12th international conference on Supercomputing
Speculative execution model with duplication
ICS '98 Proceedings of the 12th international conference on Supercomputing
Resource widening versus replication: limits and performance-cost trade-off
ICS '98 Proceedings of the 12th international conference on Supercomputing
Task selection for a multiscalar processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Widening resources: a cost-effective technique for aggressive ILP architectures
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Randomized Cache Placement for Eliminating Conflicts
IEEE Transactions on Computers - Special issue on cache memory and related problems
The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Proceedings of the 1999 ACM symposium on Applied computing
Improving the performance of speculatively parallel applications on the Hydra CMP
ICS '99 Proceedings of the 13th international conference on Supercomputing
Using complete system simulation to characterize SPECjvm98 benchmarks
Proceedings of the 14th international conference on Supercomputing
Extending Value Reuse to Basic Blocks with Compiler Support
IEEE Transactions on Computers
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Slipstream processors: improving both performance and fault tolerance
ACM SIGPLAN Notices
A study of slipstream processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Relational profiling: enabling thread-level parallelism in virtual machines
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications
IEEE Transactions on Computers
An analysis of operating system behavior on a simultaneous multithreaded architecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
SMT Layout Overhead and Scalability
IEEE Transactions on Parallel and Distributed Systems
Characterizing operating system activity in SPECjvm98 Benchmarks
Workload characterization of emerging computer applications
An energy saving strategy based on adaptive loop parallelization
Proceedings of the 39th annual Design Automation Conference
Proceedings of the 39th annual Design Automation Conference
Proceedings of the 15th international symposium on System Synthesis
OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors
Proceedings of the 15th international symposium on System Synthesis
Computer
IEEE Micro
Typing the ISA to cluster the processor
Future Generation Computer Systems - Parallel computing technologies (PaCT-2001)
Parallel simulation of chip-multiprocessor architectures
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Hardware and Software Optimizations for Multimedia Databases
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Typing the ISA to Cluster the Processor
PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
Improving Conditional Branch Prediction on Speculative Multithreading Architectures
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Fine-grain design space exploration for a cartographic SoC multiprocessor
ACM SIGARCH Computer Architecture News
Efficient Self-Timed Interfaces for Crossing Clock Domains
ASYNC '03 Proceedings of the 9th International Symposium on Asynchronous Circuits and Systems
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Exploring Microprocessor Architectures for Gigascale Integration
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Slipstream Execution Mode for CMP-Based Multiprocessors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Memory System Behavior of Java-Based Middleware
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Jrpm system for dynamically parallelizing Java programs
Proceedings of the 30th annual international symposium on Computer architecture
A Clustered Approach to Multithreaded Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A trace-level value predictor for Contrail processors
ACM SIGARCH Computer Architecture News
Multiple-path execution for chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Scaling and Charact rizing Database Workloads: Bridging the Gap between Research and Practice
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Modeling technology impact on cluster microprocessor performance
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors
Proceedings of the conference on Design, automation and test in Europe - Volume 2
A Simple Mechanism for Detecting Ineffectual Instructions in Slipstream Processors
IEEE Transactions on Computers
Cluster prefetch: tolerating on-chip wire delays in clustered microarchitectures
Proceedings of the 18th annual international conference on Supercomputing
A case for shared instruction cache on chip multiprocessors running OLTP
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Area and System Clock Effects on SMT/CMP Throughput
IEEE Transactions on Computers
Optimizing Array-Intensive Applications for On-Chip Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Using data compression in an MPSoC architecture for improving performance
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Locality-conscious workload assignment for array-based computations in MPSOC architectures
Proceedings of the 42nd annual Design Automation Conference
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
Understanding the energy efficiency of SMT and CMP with multiclustering
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Fast and fair: data-stream quality of service
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Queue - Multiprocessors
SST: Symbolic Subordinate Threading
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Heterogeneous Chip Multiprocessors
Computer
Performance implications of single thread migration on a chip multi-core
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Energy-aware computation duplication for improving reliability in embedded chip multiprocessors
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
SCMP: a single-chip message-passing parallel computer
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Selective code/data migration for reducing communication energy in embedded MpSoC architectures
GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Dynamic thread assignment on heterogeneous multiprocessor architectures
Proceedings of the 3rd conference on Computing frontiers
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory
Proceedings of the 33rd annual international symposium on Computer Architecture
Spin Detection Hardware for Improved Management of Multithreaded Systems
IEEE Transactions on Parallel and Distributed Systems
Optimizing bus energy consumption of on-chip multiprocessors using frequent values
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Reducing energy consumption of multiprocessor SoC architectures by exploiting memory bank locality
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A case for chip multiprocessors based on the data-driven multithreading model
International Journal of Parallel Programming
Thread-associative memory for multicore and multithreaded computing
Proceedings of the 2006 international symposium on Low power electronics and design
Reducing power through compiler-directed barrier synchronization elimination
Proceedings of the 2006 international symposium on Low power electronics and design
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Supporting microthread scheduling and synchronisation in CMPs
International Journal of Parallel Programming
Enabling real-time physics simulation in future interactive entertainment
Proceedings of the 2006 ACM SIGGRAPH symposium on Videogames
Design tradeoffs for tiled CMP on-chip networks
Proceedings of the 20th annual international conference on Supercomputing
TMA: a trap-based memory architecture
Proceedings of the 20th annual international conference on Supercomputing
Reunion: Complexity-Effective Multicore Redundancy
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
SMP-SoC is the answer if you ask the right questions
SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Hybrid multi-core architecture for boosting single-threaded performance
ACM SIGARCH Computer Architecture News
From chaos to QoS: case studies in CMP resource management
ACM SIGARCH Computer Architecture News
Performance of multithreaded chip multiprocessors and implications for operating system design
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Implications of Rent's Rule for NoC Design and Its Fault-Tolerance
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
MPSoC memory optimization using program transformation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A memory-conscious code parallelization scheme
Proceedings of the 44th annual Design Automation Conference
Evaluating design tradeoffs in on-chip power management for CMPs
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Implementing a WLAN video terminal using UML and fully automated design flow
EURASIP Journal on Embedded Systems
Thermal-aware scheduling for future chip multiprocessors
EURASIP Journal on Embedded Systems
Light-weight synchronization for inter-processor communication acceleration on embedded MPSoCs
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Design principles for a virtual multiprocessor
Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
A case for low-complexity MP architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Proceedings of the 13th international symposium on Low power electronics and design
A novel migration-based NUCA design for chip multiprocessors
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Leveraging on-chip networks for data cache migration in chip multiprocessors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Meeting points: using thread criticality to adapt multicore hardware to parallel regions
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Towards achieving reliable and high-performance nanocomputing via dynamic redundancy allocation
ACM Journal on Emerging Technologies in Computing Systems (JETC)
In-Network Caching for Chip Multiprocessors
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Exploring multicore computing education in China by model curriculum construction
SCE '08 Proceedings of the 1st ACM Summit on Computing Education in China on First ACM Summit on Computing Education in China
Mostly static program partitioning of binary executables
ACM Transactions on Programming Languages and Systems (TOPLAS)
Proceedings of the 36th annual international symposium on Computer architecture
Instruction-Level Fault Tolerance Configurability
Journal of Signal Processing Systems
Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design
ACM SIGARCH Computer Architecture News
Automatic Hybrid MPI+OpenMP Code Generation with llc
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications
Journal of Signal Processing Systems
Fixed-priority scheduling on prioritized SMT processor
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Real-time static voltage scaling on multiprocessors
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Increasing memory miss tolerance for SIMD cores
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Using data compression for increasing memory system utilization
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Effect of increasing chip density on the evolution of computer architectures
IBM Journal of Research and Development
Characterizing the resource-sharing levels in the UltraSPARC T2 processor
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Finding representative workloads for computer system design
Finding representative workloads for computer system design
Optimal real-time scheduling for efficient aperiodic services on multiprocessors
PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
Soft real-time scheduling with tardiness bounds on prioritized SMT processors
PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
A parallel infrastructure on dynamic EPIC SMT
ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
Performance impact of resource conflicts on chip multi-processor servers
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Constraint-aware large-scale CMP cache design
HiPC'07 Proceedings of the 14th international conference on High performance computing
Exploit temporal locality of shared data in SRC enabled CMP
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
The SKB: a semi-completely-connected bus for on-chip systems
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
HiPC'08 Proceedings of the 15th international conference on High performance computing
Chip multiprocessor based on data-driven multithreading model
International Journal of High Performance Systems Architecture
Area-efficient floorplans and interconnects for homogeneous multi-core architectures
International Journal of High Performance Systems Architecture
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications
Proceedings of the 37th annual international symposium on Computer architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Language virtualization for heterogeneous parallel computing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
PM-COSYN: PE and memory co-synthesis for MPSoCs
Proceedings of the Conference on Design, Automation and Test in Europe
PoliMakE: a policy making engine for secure embedded software execution on chip-multiprocessors
WESS '10 Proceedings of the 5th Workshop on Embedded Systems Security
Efficient address mapping of shared cache for on-chip many-core architecture
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Thread owned block cache: managing latency in many-core architecture
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
An evaluation of OpenMP on current and emerging multithreaded/multicore processors
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Evaluating OpenMP on chip multithreading platforms
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Coterminous locality and coterminous group data prefetching on chip-multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A domain-specific approach to heterogeneous parallelism
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Emerging sensing techniques for emerging memories
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Memory-, bandwidth-, and power-aware multi-core for a graph database workload
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Robust adaptation to available parallelism in transactional memory applications
Transactions on high-performance embedded architectures and compilers III
A minimalist cache coherent MPSoC designed for FPGAs
International Journal of High Performance Systems Architecture
A composite and scalable cache coherence protocol for large scale CMPs
Proceedings of the international conference on Supercomputing
Processing (multiple) spatio-temporal range queries in multicore settings
ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
The ReNoC Reconfigurable Network-on-Chip: Architecture, Configuration Algorithms, and Evaluation
ACM Transactions on Embedded Computing Systems (TECS)
Performance evaluation of a chip-multithreading server for high performance computing applications
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
L2-Cache hierarchical organizations for multi-core architectures
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Hardware budget and runtime system for data-driven multithreaded chip multiprocessor
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Acceleration techniques for chip-multiprocessor simulator debug
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Variation tolerant sensing scheme of spin-transfer torque memory for yield improvement
Proceedings of the International Conference on Computer-Aided Design
Toward enhancing OpenMP's work-sharing directives
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
A fair thread-aware memory scheduling algorithm for chip multiprocessor
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
DDM-CMP: data-driven multithreading on a chip multiprocessor
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Automated distribution of UML 2.0 designed applications to a configurable multiprocessor platform
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Reliability-aware core partitioning in chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Integrating Memory Optimization with Mapping Algorithms for Multi-Processors System-on-Chip
ACM Transactions on Embedded Computing Systems (TECS)
Thread vulnerability in parallel applications
Journal of Parallel and Distributed Computing
Disjoint out-of-order execution processor
ACM Transactions on Architecture and Code Optimization (TACO)
Support for fine-grained synchronization in shared-memory multiprocessors
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Register file management and compiler optimization on EDSMT
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
A Parallel infrastructure on dynamic EPIC SMT and its speculation optimization
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Fair CPU time accounting in CMP+SMT processors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
FROCM: a fair and low-overhead method in SMT processor
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
MultiMaKe: Chip-multiprocessor driven memory-aware kernel pipelining
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Voltage driven nondestructive self-reference sensing scheme of spin-transfer torque memory
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Performance-reliability tradeoff analysis for multithreaded applications
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
A hyperscalar dual-core architecture for embedded systems
Microprocessors & Microsystems
Exploiting replication to improve performances of NUCA-based CMP systems
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Parallel flow routing in SWMM 5
Environmental Modelling & Software
Hi-index | 0.02 |
Advances in IC processing allow for more microprocessor design options. The increasing gate density and cost of wires in advanced integrated circuit technologies require that we look for new ways to use their capabilities effectively. This paper shows that in advanced technologies it is possible to implement a single-chip multiprocessor in the same area as a wide issue superscalar processor. We find that for applications with little parallelism the performance of the two microarchitectures is comparable. For applications with large amounts of parallelism at both the fine and coarse grained levels, the multiprocessor microarchitecture outperforms the superscalar architecture by a significant margin. Single-chip multiprocessor architectures have the advantage in that they offer localized implementation of a high-clock rate processor for inherently sequential applications and low latency interprocessor communication for parallel applications.