Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
Reducing cache misses using hardware and software page placement
ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Architecture and design of AlphaServer GS320
ACM SIGPLAN Notices
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor
ICS '01 Proceedings of the 15th international conference on Supercomputing
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The HSSM macro-architecture, Virtual Machine and H languages
ACM SIGPLAN Notices
Trident: a scalable architecture for scalar, vector, and matrix operations
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Creating Realistic Scenes in Future Multimedia Systems
IEEE MultiMedia
Speculative Multithreaded Processors
Computer
Computer
Parallel simulation of chip-multiprocessor architectures
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Amir Roth: Speculative Multithreaded Processors
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Return-Address Prediction in Speculative Multithreaded Environments
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Evaluating the XMT Parallel Programming Model
HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
Performance of On-Chip Multiprocessors for Vision Tasks
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Hardware and Software Optimizations for Multimedia Databases
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Performances of a Dynamic Threads Scheduler
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Improving the Performance of Heterogeneous DSMs via Multithreading
VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Microprocessors - 10 Years Back, 10 Years Ahead
Informatics - 10 Years Back. 10 Years Ahead.
Design methodology for a modular service-driven network processor architecture
Computer Networks: The International Journal of Computer and Telecommunications Networking - Network processors
Parallel ray tracing on a chip
Practical parallel rendering
Fine-grain design space exploration for a cartographic SoC multiprocessor
ACM SIGARCH Computer Architecture News
RTAS '03 Proceedings of the The 9th IEEE Real-Time and Embedded Technology and Applications Symposium
Guaranteeing the quality of services in networks on chip
Networks on chip
On packet switched networks for on-chip communication
Networks on chip
Multiple-path execution for chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Min-cut program decomposition for thread-level speculation
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
The energy efficiency of CMP vs. SMT for multimedia workloads
Proceedings of the 18th annual international conference on Supercomputing
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Effectively sharing a cache among threads
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Decoupled Software Pipelining with the Synchronization Array
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Conjoined-Core Chip Multiprocessing
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Programming Configurable Multiprocessors
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing inter-processor data locality on embedded chip multiprocessors
Proceedings of the 5th ACM international conference on Embedded software
Temperature-Sensitive Loop Parallelization for Chip Multiprocessors
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Exploring the cache design space for large scale CMPs
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Dynamically configurable shared CMP helper engines for improved performance
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Code restructuring for improving cache performance of MPSoCs
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Dynamic partitioning of processing and memory resources in embedded MPSoC architectures
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory
Proceedings of the 33rd annual international symposium on Computer Architecture
Throttling-Based Resource Management in High Performance Multithreaded Architectures
IEEE Transactions on Computers
Thread-associative memory for multicore and multithreaded computing
Proceedings of the 2006 international symposium on Low power electronics and design
Improving the performance and power efficiency of shared helpers in CMPs
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Software—Practice & Experience
Supporting microthread scheduling and synchronisation in CMPs
International Journal of Parallel Programming
Computational models for image processing for shared-memory multiprocessors
Integrated Computer-Aided Engineering
A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Fpga-based prototype of a pram-on-chip processor
Proceedings of the 5th conference on Computing frontiers
Software-directed combined cpu/link voltage scaling fornoc-based cmps
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Utilizing shared data in chip multiprocessors with the Nahalal architecture
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Software-Controlled Priority Characterization of POWER5 Processor
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Memory hierarchy performance measurement of commercial dual-core desktop processors
Journal of Systems Architecture: the EUROMICRO Journal
Towards achieving reliable and high-performance nanocomputing via dynamic redundancy allocation
ACM Journal on Emerging Technologies in Computing Systems (JETC)
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers I
Exploiting multithreaded architectures to improve the hash join operation
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Improving error tolerance for multithreaded register files
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring multicore computing education in China by model curriculum construction
SCE '08 Proceedings of the 1st ACM Summit on Computing Education in China on First ACM Summit on Computing Education in China
Proceedings of the 6th ACM conference on Computing frontiers
FlexDCP: a QoS framework for CMP architectures
ACM SIGOPS Operating Systems Review
SPMTM: A Novel ScratchPad Memory Based Hybrid Nested Transactional Memory Framework
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Can transactions enhance parallel programs?
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
MLP-aware dynamic cache partitioning
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
A hyperscalar multi-core architecture
Proceedings of the 7th ACM international conference on Computing frontiers
Global management of cache hierarchies
Proceedings of the 7th ACM international conference on Computing frontiers
Exploit temporal locality of shared data in SRC enabled CMP
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
HiPC'08 Proceedings of the 15th international conference on High performance computing
Area-efficient floorplans and interconnects for homogeneous multi-core architectures
International Journal of High Performance Systems Architecture
A case for FAME: FPGA architecture model execution
Proceedings of the 37th annual international symposium on Computer architecture
Tale in the multi-core era: is java still competitive to host SIP applications?
ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Dynamic processors demand dynamic operating systems
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Process variation aware thread mapping for chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
SCMP architecture: an asymmetric multiprocessor system-on-chip for dynamic applications
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Performance characteristics of OpenMP language constructs on a many-core-on-a-chip architecture
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Low Power Design for a Multi-core Multi-thread Microprocessor
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Dynamic cache partitioning based on the MLP of cache misses
Transactions on high-performance embedded architectures and compilers III
Power-aware dynamic cache partitioning for CMPs
Transactions on high-performance embedded architectures and compilers III
A composite and scalable cache coherence protocol for large scale CMPs
Proceedings of the international conference on Supercomputing
The ReNoC Reconfigurable Network-on-Chip: Architecture, Configuration Algorithms, and Evaluation
ACM Transactions on Embedded Computing Systems (TECS)
Computers and Electrical Engineering
RCMP: a reconfigurable chip-multiprocessor architecture
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Parallel image segmentation in reconfigurable chip multiprocessors
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Acceleration techniques for chip-multiprocessor simulator debug
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Compositional constraints generation for concurrent real-time loops with interdependent iterations
IICS'05 Proceedings of the 5th international conference on Innovative Internet Community Systems
JAHUEL: a formal framework for software synthesis
ICFEM'05 Proceedings of the 7th international conference on Formal Methods and Software Engineering
The design space of CMP vs. SMT for high performance embedded processor
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Efficient and scalable scheduling for performance heterogeneous multicore systems
Journal of Parallel and Distributed Computing
Low power microprocessor design for embedded systems
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Design and analysis of adaptive processor
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Function units sharing between neighbor cores in CMP
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
An architecture for software-based iSCSI: experiences and analyses
NETWORKING'05 Proceedings of the 4th IFIP-TC6 international conference on Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communication Systems
Polymorphic contention management
DISC'05 Proceedings of the 19th international conference on Distributed Computing
A shared matrix unit for a chip multi-core processor
Journal of Parallel and Distributed Computing
A hyperscalar dual-core architecture for embedded systems
Microprocessors & Microsystems
(n-3)-edge-fault-tolerant weak-pancyclicity of (n,k)-star graphs
Theoretical Computer Science
Hi-index | 4.11 |
These Stanford University researchers present the case for billion-transistor processor architectures that will consist of chip multiprocessors (CMPs): multiple (four to 16) simple, fast processors on one chip. In their proposal, each processor is tightly coupled to a small, fast, level-one cache, and all processors share a larger level-two cache. The processors may collaborate on a parallel job or run independent tasks (as in the SMT proposal). The CMP architecture lends itself to simpler design, faster validation, cleaner functional partitioning, and higher theoretical peak performance. However for this architecture to realize its performance potential, either programmers or compilers will have to make code explicitly parallel. Old ISAs will be incompatible with this architecture (although they could run slowly on one of the small processors).