The limits of automatic OS fingerprint generation

Authors:
David W. Richardson;Steven D. Gribble;Tadayoshi Kohno
Affiliations:
University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA
Venue:
Proceedings of the 3rd ACM workshop on Artificial intelligence and security
Year:
2010

Citing 9
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Automated packet trace analysis of TCP implementations

SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
On inferring TCP behavior

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Remote Physical Device Fingerprinting

SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
A virtual honeypot framework

SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
Probing TCP implementations

USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Passive data link layer 802.11 wireless device driver fingerprinting

USENIX-SS'06 Proceedings of the 15th conference on USENIX Security Symposium - Volume 15
Toward undetected operating system fingerprinting

WOOT '07 Proceedings of the first USENIX workshop on Offensive Technologies
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Remote operating system fingerprinting relies on implementation differences between OSs to identify the specific variant executing on a remote host. Because these differences can be subtle and difficult to find, most fingerprinting tools require expert manual effort to construct discriminative fingerprints and classification models. In prior work, Caballero et al. proposed a promising technique to eliminate manual intervention: the automatic generation of fingerprints using an approach similar to fuzz testing [6]. Their work evaluated the technique in a small-scale, carefully controlled test environment. In this paper, we re-examine automatic OS fingerprinting in a more challenging large-scale scenario to better understand the viability of the technique. In contrast to the prior work, we find that automatic fingerprint generation suffers from several limitations and technical hurdles that can limit its effectiveness, particularly in more demanding, realistic environments. We use machine learning algorithms from the well-known Weka [11] data mining toolkit to automatically generate fingerprints over 329 different machine instances, and we compare the accuracy of our automatically generated fingerprints to Nmap. Our results suggest that overfitting to non-OS-specific behavioral differences, the indistinguishability of different OS variants, the biasing of an automatic tool to the makeup of the training data, and the lack of ability of an automatic tool to exploit protocol and software semantics significantly limit the usefulness of this technique in practice. Automatic techniques can help identify candidate signatures, but our results suggest that manual expertise will remain an integral part of fingerprint generation.