Understanding Scalability and Performance Requirements of I/O-Intensive Applications on Future Multicore Servers

Authors:
Shoaib Akram;Manolis Marazakis;Angelos Bilas
Affiliations:
-;-;-
Venue:
MASCOTS '12 Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Year:
2012

Citing 0
Cited 1

Understanding and improving the cost of scaling distributed event processing

Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, there is increased interest in understanding the impact of data-centric applications on compute and storage infrastructures as datasets are projected to grow dramatically. In this paper, we examine the storage I/O behavior of twelve data-centric applications as the number of cores per server grows. We configure these applications with realistic datasets and examine configuration points where they perform significant amount of I/O. We propose using cycles per I/O (cpio) as a metric for abstracting many I/O subsystem configuration details. We analyze specific architectural issues pertaining to data-centric applications including the usefulness of hyperthreading, sensitivity to memory bandwidth, and the potential impact of disruptive storage technologies. Our results show that today's data-centric applications are not able to scale with the number of cores: moving from one to eight cores, results in 0% to 400% more cycles per I/O operation. These applications can achieve much of their performance with only 50% of the memory bandwidth available on modern processors. Hyper-threading is extremely effective for these applications and, on average, applications suffer only a 15% reduction in performance when hyper-threading is used instead of full cores. Further, DRAM-type persistent memory has the potential to solve scalability bottlenecks by reducing or eliminating idle and I/O completion periods and improving server utilization. We use a detailed methodology to project that in the year 2020, at 4096 processors, servers will require between 250-500 GB/s under optimistic scaling assumptions. We show that if the current trend in application scalability is not reversed, we will need about 2.5M servers that will consume 10 BKWh of energy to do a single pass over the projected 35 Zeta Bytes of data in 2020.