On the core affinity and file upload performance of Hadoop

Authors:
Joong-Yeon Cho;Hyun-Wook Jin;Min Lee;Karsten Schwan
Affiliations:
Konkuk University, Seoul, Korea;Konkuk University, Seoul, Korea;Georgia Institute of Technology, Atlanta, GA;Georgia Institute of Technology, Atlanta, GA
Venue:
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Year:
2013

Citing 15
Cited 0

The effectiveness of affinity-based scheduling in multiprocessor network protocol processing (extended version)

IEEE/ACM Transactions on Networking (TON)
Flexible Control of Parallelism in a Multiprocessor PC Router

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Load balancing for parallel forwarding

IEEE/ACM Transactions on Networking (TON)
Sequence-preserving adaptive load balancers

Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
An evaluation of network stack parallelization strategies in modern operating systems

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multicore Environments

HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
Architectural Characterization of Processor Affinity in Network Processing

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Performance scalability of a multi-core web server

Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Asymmetric interactions in symmetric multi-core systems: analysis, enhancements and evaluation

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
MiAMI: Multi-core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces

HOTI '09 Proceedings of the 2009 17th IEEE Symposium on High Performance Interconnects
FlexSC: flexible system call scheduling with exception-less system calls

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Performance enhancement of SMP clusters with multiple network interfaces using virtualization

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Region scheduling: efficiently using the cache architectures via page-level affinity

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Cache-aware affinitization on commodity multicores for high-speed network flows

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The MapReduce programming model is introduced for big-data processing, where the data nodes perform both data storing and computation. Thus, we need to understand different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. The core affinity defines mapping between a set of cores and a given task. The core affinity can be decided based on resource requirements of a task because this largely affects the efficiency of computation, memory, and I/O resource utilization. In this paper, we analyze the impact of core affinity on the file upload performance of Hadoop Distributed File System (HDFS). Our study can provide the insight into the process scheduling issues on big-data processing systems. We also suggest a framework for dynamic core affinity based on our observations and show that a preliminary implementation can improve the throughput more than 40% compared with default Linux system.