An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Tradeoffs in two-level on-chip caching
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Alpha AXP architecture reference manual (2nd ed.)
Alpha AXP architecture reference manual (2nd ed.)
S-connect: from networks of workstations to supercomputer performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Evaluation of design alternatives for a multiprocessor microprocessor
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evaluation of multithreaded uniprocessors for commercial application environments
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Studies of Windows NT performance using dynamic execution traces
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
ICS '98 Proceedings of the 12th international conference on Supercomputing
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Performance of an OLTP application on symmetry multiprocessor system
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Computer
The Future of Systems Research
Computer
System Optimization for OLTP Workloads
IEEE Micro
Exploiting Parallelism in Cache Coherency Protocol Engines
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Architecture and design of AlphaServer GS320
ACM SIGPLAN Notices
Timestamp snooping: an approach for extending SMPs
ACM SIGPLAN Notices
Thread-level parallelism and interactive performance of desktop applications
ACM SIGPLAN Notices
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Timestamp snooping: an approach for extending SMPs
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Thread-level parallelism and interactive performance of desktop applications
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Code layout optimizations for transaction processing workloads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Speculative lock elision: enabling highly concurrent multithreaded execution
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An adaptive chip-multiprocessor architecture for future mobile terminals
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer
International Journal of Parallel Programming
Speculative Multithreaded Processors
Computer
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
Parallel simulation of chip-multiprocessor architectures
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Instruction Level Distributed Processing
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Specification and Verification Framework for Developing Weak Shared Memory Consistency Protocols
FMCAD '02 Proceedings of the 4th International Conference on Formal Methods in Computer-Aided Design
In-memory Parallelism for Database Workloads
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Efficient synchronization for nonuniform communication architectures
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Dissecting Cyclops: a detailed analysis of a multithreaded architecture
ACM SIGARCH Computer Architecture News
Hierarchical Backoff Locks for Nonuniform Communication Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Memory System Behavior of Java-Based Middleware
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
An evaluation of speculative instruction execution on simultaneous multithreaded processors
ACM Transactions on Computer Systems (TOCS)
Verifying Sequential Consistency on Shared-Memory Multiprocessors by Model Checking
IEEE Transactions on Parallel and Distributed Systems
Scaling and Charact rizing Database Workloads: Bridging the Gap between Research and Practice
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
The Impact of Negative Acknowledgments in Shared Memory Scientific Applications
IEEE Transactions on Parallel and Distributed Systems
Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors
Proceedings of the conference on Design, automation and test in Europe - Volume 2
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP
ACM Transactions on Architecture and Code Optimization (TACO)
Packetization and routing analysis of on-chip multiprocessor networks
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Networks on chip
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Effectively sharing a cache among threads
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Exploring Virtual Network Selection Algorithms in DSM Cache Coherence Protocols
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Balancing design options with Sherpa
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
A case for shared instruction cache on chip multiprocessors running OLTP
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Conjoined-Core Chip Multiprocessing
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Reducing misspeculation overhead for module-level speculative execution
Proceedings of the 2nd conference on Computing frontiers
Cache coherence support for non-shared bus architecture on heterogeneous MPSoCs
Proceedings of the 42nd annual Design Automation Conference
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Mitigating Amdahl's Law through EPI Throttling
Proceedings of the 32nd annual international symposium on Computer Architecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Distance-aware L2 cache organizations for scalable multiprocessor systems
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Reconfigurable embedded systems: Synthesis, design and application
Fast and fair: data-stream quality of service
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Maximizing CMP Throughput with Mediocre Cores
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Characterization of TCC on Chip-Multiprocessors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Queue - Multiprocessors
Queue - Multiprocessors
Temperature-Sensitive Loop Parallelization for Chip Multiprocessors
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Heterogeneous Chip Multiprocessors
Computer
Performance/Watt: the new server focus
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Performance implications of single thread migration on a chip multi-core
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
International Journal of Parallel Programming
Constructing Virtual Architectures on a Tiled Processor
Proceedings of the International Symposium on Code Generation and Optimization
Chip multiprocessing and the cell broadband engine
Proceedings of the 3rd conference on Computing frontiers
Code restructuring for improving cache performance of MPSoCs
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Chip multithreading systems need a new operating system scheduler
Proceedings of the 11th workshop on ACM SIGOPS European workshop
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Dense Gaussian networks: suitable topologies for on-chip multiprocessors
International Journal of Parallel Programming
A case for chip multiprocessors based on the data-driven multithreading model
International Journal of Parallel Programming
AVIO: detecting atomicity violations via access interleaving invariants
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A regulated transitive reduction (RTR) for longer memory race recording
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Supporting microthread scheduling and synchronisation in CMPs
International Journal of Parallel Programming
Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors
Proceedings of the 20th annual international conference on Supercomputing
TMA: a trap-based memory architecture
Proceedings of the 20th annual international conference on Supercomputing
Reunion: Complexity-Effective Multicore Redundancy
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Coherence Ordering for Ring-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Computer Systems (TOCS)
Proximity-aware directory-based coherence for multi-core processor architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Virtual hierarchies to support server consolidation
Proceedings of the 34th annual international symposium on Computer architecture
Comparing memory systems for chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
The Power of Priority: NoC Based Distributed Cache Coherency
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Hardware Support for Accelerating Data Movement in Server Platform
IEEE Transactions on Computers
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors
Proceedings of the 21st annual international conference on Supercomputing
Cooperative cache partitioning for chip multiprocessors
Proceedings of the 21st annual international conference on Supercomputing
Analysis of dynamic voltage/frequency scaling in chip-multiprocessors
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Design of adaptive multiprocessor on chip systems
Proceedings of the 20th annual conference on Integrated circuits and systems design
Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Characterization of Apache web server with Specweb2005
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor
International Journal of Parallel Programming
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Trends toward on-chip networked microsystems
International Journal of High Performance Computing and Networking
A case for low-complexity MP architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Power management of variation aware chip multiprocessors
Proceedings of the 18th ACM Great Lakes symposium on VLSI
TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Towards hybrid last level caches for chip-multiprocessors
ACM SIGARCH Computer Architecture News
Communications of the ACM - Remembering Jim Gray
Queue - The Concurrency Problem
Comparative evaluation of memory models for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Making secure processors OS- and performance-friendly
ACM Transactions on Architecture and Code Optimization (TACO)
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers I
Token tenure: PATCHing token counting using directory-based cache coherence
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A multigrain Delaunay mesh generation method for multicore SMT-based architectures
Journal of Parallel and Distributed Computing
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
Boosting single-thread performance in multi-core systems through fine-grain multi-threading
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 36th annual international symposium on Computer architecture
Triplet-based topology for on-chip networks
WSEAS Transactions on Computers
A scalable micro wireless interconnect structure for CMPs
Proceedings of the 15th annual international conference on Mobile computing and networking
Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
An Efficient Lightweight Shared Cache Design for Chip Multiprocessors
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Reconsidering algorithms for iterative solvers in the multicore era
International Journal of Computational Science and Engineering
Increasing memory miss tolerance for SIMD cores
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Research on synthesis parameter real-time scheduling algorithm on multi-core architecture
CCDC'09 Proceedings of the 21st annual international conference on Chinese control and decision conference
Mitigating the impact of variability on chip-multiprocessor power and performance
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A scalable organization for distributed directories
Journal of Systems Architecture: the EUROMICRO Journal
Evaluation of OpenMP for the cyclops multithreaded architecture
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Efficient methods for formally verifying safety properties of hierarchical cache coherence protocols
Formal Methods in System Design
Proceedings of the 7th ACM international conference on Computing frontiers
The SKB: a semi-completely-connected bus for on-chip systems
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Fault-tolerant cache coherence protocols for CMPs: evaluation and trade-offs
HiPC'08 Proceedings of the 15th international conference on High performance computing
Evolution of thread-level parallelism in desktop applications
Proceedings of the 37th annual international symposium on Computer architecture
Web search using mobile cores: quantifying and mitigating the price of efficiency
Proceedings of the 37th annual international symposium on Computer architecture
Cohesion: a hybrid memory model for accelerators
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 7th International Conference on Frontiers of Information Technology
Token tenure and PATCH: A predictive/adaptive token-counting hybrid
ACM Transactions on Architecture and Code Optimization (TACO)
TurboTag: lookup filtering to reduce coherence directory power
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Light NUCA: a proposal for bridging the inter-cache latency gap
Proceedings of the Conference on Design, Automation and Test in Europe
Group-caching for NoC based multicore cache coherent systems
Proceedings of the Conference on Design, Automation and Test in Europe
A study of the on-chip interconnection network for the IBM Cyclops64 multi-core architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Coterminous locality and coterminous group data prefetching on chip-multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Reconfiguration of embedded java applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Fractal Coherence: Scalably Verifiable Cache Coherence
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Memory-, bandwidth-, and power-aware multi-core for a graph database workload
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Research note: C-AMTE: A location mechanism for flexible cache management in chip multiprocessors
Journal of Parallel and Distributed Computing
Green secure processors: towards power-efficient secure processor design
Transactions on computational science X
Mobile processors for energy-efficient web search
ACM Transactions on Computer Systems (TOCS)
Energy-efficient cache coherence protocol for NoC-based MPSoCs
Proceedings of the 24th symposium on Integrated circuits and systems design
Scheduling javasymphony applications on many-core parallel computers
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Filtering directory lookups in CMPs with write-through caches
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Processing (multiple) spatio-temporal range queries in multicore settings
ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
Filtering directory lookups in CMPs
Microprocessors & Microsystems
L2-Cache hierarchical organizations for multi-core architectures
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
RCMP: a reconfigurable chip-multiprocessor architecture
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Memory subsystem characterization in a 16-core snoop-based chip-multiprocessor architecture
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Speeding-up synchronizations in DSM multiprocessors
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Function units sharing between neighbor cores in CMP
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Chameleon: operating system support for dynamic processors
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
DDM-CMP: data-driven multithreading on a chip multiprocessor
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Manager-client pairing: a framework for implementing coherence hierarchies
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Performance and power aware CMP thread allocation modeling
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Proceedings of the 39th Annual International Symposium on Computer Architecture
Spatiotemporal Coherence Tracking
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting process variability in voltage/frequency control
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Dynamic directories: a mechanism for reducing on-chip interconnect power in multicores
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Multi-grain coherence directories
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Market mechanisms for managing datacenters with heterogeneous microarchitectures
ACM Transactions on Computer Systems (TOCS)
DP&TB: a coherence filtering protocol for many-core chip multiprocessors
The Journal of Supercomputing
Hi-index | 0.01 |
The microprocessor industry is currently struggling with higher development costs and longer design times that arise from exceedingly complex processors that are pushing the limits of instruction-level parallelism. Meanwhile, such designs are especially ill suited for important commercial applications, such as on-line transaction processing (OLTP), which suffer from large memory stall times and exhibit little instruction-level parallelism. Given that commercial applications constitute by far the most important market for high-performance servers, the above trends emphasize the need to consider alternative processor designs that specifically target such workloads. The abundance of explicit thread-level parallelism in commercial workloads, along with advances in semiconductor integration density, identify chip multiprocessing (CMP) as potentially the most promising approach for designing processors targeted at commercial servers. This paper describes the Piranha system, a research prototype being developed at Compaq that aggressively exploits chip multi-processing by integrating eight simple Alpha processor cores along with a two-level cache hierarchy onto a single chip. Piranha also integrates further on-chip functionality to allow for scalable multiprocessor configurations to be built in a glueless and modular fashion. The use of simple processor cores combined with an industry-standard ASIC design methodology allow us to complete our prototype within a short time-frame, with a team size and investment that are an order of magnitude smaller than that of a commercial microprocessor. Our detailed simulation results show that while each Piranha processor core is substantially slower than an aggressive next-generation processor, the integration of eight cores onto a single chip allows Piranha to outperform next-generation processors by up to 2.9 times (on a per chip basis) on important workloads such as OLTP. This performance advantage can approach a factor of five by using full-custom instead of ASIC logic. In addition to exploiting chip multiprocessing, the Piranha prototype incorporates several other unique design choices including a shared second-level cache with no inclusion, a highly optimized cache coherence protocol, and a novel I/O architecture.