A class of compatible cache consistency protocols and their support by the IEEE futurebus
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
VLIW Compilation Techniques for Superscalar Architectures
CC '98 Proceedings of the 7th International Conference on Compiler Construction
A holistic methodology for network processor design
LCN '03 Proceedings of the 28th Annual IEEE International Conference on Local Computer Networks
Feedback driven instruction-set extension
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Network Application Driven Instruction Set Extensions for Embedded Processing Clusters
PARELEC '04 Proceedings of the international conference on Parallel Computing in Electrical Engineering
Managing memory access latency in packet processing
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A Scalable Parallel SoC Architecture for Network Processors
ISVLSI '05 Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design
GigaNetIC – a scalable embedded on-chip multiprocessor architecture for network applications
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Hi-index | 0.00 |
In this paper, we present an advanced multiprocessor cache architecture for chip multiprocessors (CMPs). It is designed for the scalable GigaNetIC CMP, which is based on massively parallel on-chip computing clusters. Our write-through multiprocessor cache is configurable in respect to the most relevant design options. It is supposed to be used in universal co-processors as well as in network processing units. For an early verification of the software and an early exploration of various hardware configurations, we have developed a SystemC-based simulation model for the complete chip multiprocessor. For detailed hardware-software co-verification, we use our FPGA-based rapid prototyping system RAPTOR2000 to emulate our architecture with near-ASIC performance. Finally, we demonstrate the performance gains for different application scenarios enabled by the usage of our multiprocessor cache.