When the CRC and TCP checksum disagree

Authors:
Jonathan Stone;Craig Partridge
Affiliations:
Stanford Distributed Systems Group;BBN Technologies
Venue:
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Year:
2000

Citing 3
Cited 32

Gigabit networking

Gigabit networking
Performance of checksums and CRC's over real data

IEEE/ACM Transactions on Networking (TON)
End-to-end internet packet dynamics

IEEE/ACM Transactions on Networking (TON)

A web server's view of the transport layer

ACM SIGCOMM Computer Communication Review
An on-demand secure routing protocol resilient to byzantine failures

WiSE '02 Proceedings of the 1st ACM workshop on Wireless security
Statistical analysis of malformed packets and their origins in the modern internet

Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
Measuring packet reordering

Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
Failure Mode Analysis of CORBA Service Implementations

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
A TCP tuning daemon

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
The effects of systemic packet loss on aggregate TCP flows

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Error Scope on a Computational Grid: Theory and Practice

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Application provided checksums

ICCC '02 Proceedings of the 15th international conference on Computer communication
Application-Level Survivable Software: rFTP Proof-of-Concept

LCN '01 Proceedings of the 26th Annual IEEE Conference on Local Computer Networks
Fuzzy-logic-based TCP congestion control system

Network control and engineering for Qos, security and mobility II
A network-failure-tolerant message-passing system for terascale clusters

International Journal of Parallel Programming
Explicit transport error notification (ETEN) for error-prone wireless and satellite networks

Computer Networks: The International Journal of Computer and Telecommunications Networking - Special issue: Networking for the earth science
Assessing Fault Sensitivity in MPI Applications

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Optimizing 10-Gigabit Ethernet for Networks of Workstations, Clusters, and Grids: A Case Study

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Network system design affects distributed parallel computing

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
10Gb/s Ethernet performance and retrospective

ACM SIGCOMM Computer Communication Review
A first look at modern enterprise traffic

IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
ODSBR: An on-demand secure Byzantine resilient routing protocol for wireless ad hoc networks

ACM Transactions on Information and System Security (TISSEC)
Communication issues within high performance computing grids

International Journal of High Performance Computing and Networking
Analyzing the impact of supporting out-of-order communication on in-order performance with iWARP

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Path-quality monitoring in the presence of adversaries

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Can software reliability outperform hardware reliability on high performance interconnects?: a case study with MPI over infiniband

Proceedings of the 22nd annual international conference on Supercomputing
Evaluation of a comprehensive P2P video-on-demand streaming system

Computer Networks: The International Journal of Computer and Telecommunications Networking
HTTP-MPLEX: An enhanced hypertext transfer protocol and its performance evaluation

Journal of Network and Computer Applications
Differential synchronization

Proceedings of the 9th ACM symposium on Document engineering
Review: Passive internet measurement: Overview and guidelines based on experiences

Computer Communications
Delay-based congestion avoidance for QoS provisioning in wired/wireless networks

ICC'09 Proceedings of the 2009 IEEE international conference on Communications
SoftRDMA: implementing iWARP over TCP kernel sockets

IBM Journal of Research and Development
iWarp protocol kernel space software implementation

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Taxonomy and survey of retransmission-based partially reliable transport protocols

Computer Communications
Improving availability in distributed systems with failure informers

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traces of Internet packets from the past two years show that between 1 packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on links where link-level CRCs should catch all but 1 in 4 billion errors. For certain situations, the rate of checksum failures can be even higher: in one hour-long test we observed a checksum failure of 1 packet in 400. We investigate why so many errors are observed, when link-level CRCs should catch nearly all of them.We have collected nearly 500,000 packets which failed the TCP or UDP or IP checksum. This dataset shows the Internet has a wide variety of error sources which can not be detected by link-level checks. We describe analysis tools that have identified nearly 100 different error patterns. Categorizing packet errors, we can infer likely causes which explain roughly half the observed errors. The causes span the entire spectrum of a network stack, from memory errors to bugs in TCP.After an analysis we conclude that the checksum will fail to detect errors for roughly 1 in 16 million to 10 billion packets. From our analysis of the cause of errors, we propose simple changes to several protocols which will decrease the rate of undetected error. Even so, the highly non-random distribution of errors strongly suggests some applications should employ application-level checksums or equivalents.