Memory hierarchy performance measurement of commercial dual-core desktop processors

  • Authors:
  • Lu Peng;Jih-Kwon Peir;Tribuvan K. Prakash;Carl Staelin;Yen-Kuang Chen;David Koppelman

  • Affiliations:
  • Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803, United States;Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, United States;Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803, United States;Hewlett-Packard Laboratories, Technion City, Haifa 32000, Israel;Architecture Research Laboratory, Intel Corporation, Santa Clara, CA 95052, United States;Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803, United States

  • Venue:
  • Journal of Systems Architecture: the EUROMICRO Journal
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

As chip multiprocessor (CMP) has become the mainstream in processor architectures, Intel and AMD have introduced their dual-core processors. In this paper, performance measurement on an Intel Core 2 Duo, an Intel Pentium D and an AMD Athlon 64x2 processor are reported. According to the design specifications, key derivations exist in the critical memory hierarchy architecture among these dual-core processors. In addition to the overall execution time and throughput measurement using both multi-program-med and multi-threaded workloads, this paper provides detailed analysis on the memory hierarchy performance and on the performance scalability between single and dual cores. Our results indicate that for better performance and scalability, it is important to have (1) fast cache-to-cache communication, (2) large L2 or shared capacity, (3) fast L2 to core latency, and (4) fair cache resource sharing. Three dual-core processors that we studied have shown benefits of some of these factors, but not all of them. Core 2 Duo has the best performance for most of the workloads because of its microarchitecture features such as the shared L2 cache. Pentium D shows the worst performance in many aspects due to its technology-remap of Pentium 4 without taking the advantage of on-chip communication.