IBM eServer z900 high-frequency microprocessor technology, circuits, and design methodology
IBM Journal of Research and Development
IBM Journal of Research and Development
Power-constrained high-frequency circuits for the IBM POWER6 microprocessor
IBM Journal of Research and Development
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Empirical performance models for 3T1D memories
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Light NUCA: a proposal for bridging the inter-cache latency gap
Proceedings of the Conference on Design, Automation and Test in Europe
Comparing FPGA vs. custom cmos and the impact on processor microarchitecture
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
IBM Journal of Research and Development
LP-NUCA: networks-in-cache for high-performance low-power embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
The IBM POWER6™ microprocessor presented new challenges to array design because of its high-frequency requirement and its use of 65-nm silicon-on-insulator (SOI) technology. Advancements in performance (2X to 3X improvement over the 90-nm generation) and design margins (cell stability, writability, and redundancy coverage) were major focus areas. Key elements of the POWER6 processor chip arrays include paradigm shifts such as thin memory cell layout, large signal read (without a sense amplifier), segmented bitline structure, unclamped column-half-select scheme, multidimensional programmable timing control, and separate elevated static random access memory (SRAM) power supply. There are two main array categories on the POWER6 microprocessor chip: core and nest. Processor core arrays use a single-port, 0.75-µm2, six-transistor (6T) cell and operate at full frequency, whereas the surrounding nest arrays use a smaller 0.65-µm2 cell that operates at half or one-quarter of the core frequency in order to achieve better density and power efficiency. The core arrays include the 96-KB instruction cache (I-cache) and the 64-KB data cache (D-cache), with associate lookup-path SRAM macros. The I-cache is a four-way set-associative, single-port design, whereas the D-cache is an eight-way design with dual read ports to handle multithreading capability. The lookup-path arrays contain content-addressable memory (CAM) and RAM macros with integrated dynamic hit logic circuitry. In the nest portion, an 8-MB level 2 (L2) D-cache and a level 3 (L3) directory (1.2 MB) make up the largest arrays. The latter macro designs use longer bitlines and orthogonal word-decode layouts to achieve high array-area efficiency.