Preventing TCP incast throughput collapse at the initiation, continuation, and termination

Authors:
Adrian S.-W. Tam;Kang Xi;Yang Xu;H. Jonathan Chao
Affiliations:
Polytechnic Institute of New York University;Polytechnic Institute of New York University;Polytechnic Institute of New York University;Polytechnic Institute of New York University
Venue:
Proceedings of the 2012 IEEE 20th International Workshop on Quality of Service
Year:
2012

Citing 17
Cited 0

Random early detection gateways for congestion avoidance

IEEE/ACM Transactions on Networking (TON)
Simulation-based comparisons of Tahoe, Reno and SACK TCP

ACM SIGCOMM Computer Communication Review
A cost-effective, high-bandwidth storage architecture

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The Panasas ActiveScale Storage Cluster: Delivering Scalable High Bandwidth Storage

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Measurement and analysis of TCP throughput collapse in cluster-based storage systems

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
On application-level approaches to avoiding TCP throughput collapse in cluster-based storage systems

PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
Safe and effective fine-grained TCP retransmissions for datacenter communication

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Understanding data center traffic characteristics

Proceedings of the 1st ACM workshop on Research on enterprise networking
Understanding TCP incast throughput collapse in datacenter networks

Proceedings of the 1st ACM workshop on Research on enterprise networking
An argument for increasing TCP's initial congestion window

ACM SIGCOMM Computer Communication Review
Data center TCP (DCTCP)

Proceedings of the ACM SIGCOMM 2010 conference
Performance of Quantized Congestion Notification in TCP Incast Scenarios of Data Centers

MASCOTS '10 Proceedings of the 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
ICTCP: Incast Congestion Control for TCP in data center networks

Proceedings of the 6th International COnference
An application-level solution for the TCP-incast problem in data center networks

Proceedings of the Nineteenth International Workshop on Quality of Service
Shrinking MTU to Mitigate TCP Incast Throughput Collapse in Data Center Networks

CMC '11 Proceedings of the 2011 Third International Conference on Communications and Mobile Computing
A Probabilistic Approach to Address TCP Incast in Data Center Networks

ICDCSW '11 Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

Incast applications have grown in popularity with the advancement of data center technology. It is found that the TCP incast may suffer from the throughput collapse problem, as a consequence of TCP retransmission timeouts when the bottleneck buffer is overwhelmed and causes the packet losses. This is critical to the Quality of Service of cloud computing applications. While some previous literature has proposed solutions, we still see the problem not completely solved. In this paper, we investigate the three root causes for the poor performance of TCP incast flows and propose three solutions, one for each at the beginning, the middle and the end of a TCP connection. The three solutions are: admission control to TCP flows so that the flow population would not exceed the network's capacity; retransmission based on timestamp to detect loss of retransmitted packets; and reiterated FIN packets to keep the TCP connection active until the the termination of a session is acknowledged. The orchestration of these solutions prevents the throughput collapse. The main idea of these solutions is to ensure all the on-going TCP incast flows can maintain the self-clocking, thus eliminates the need to resort to retransmission timeout for recovery. We evaluate these solutions and find them work well in preventing the retransmission timeout of TCP incast flows, hence also preventing the throughput collapse.