SimFuzz: Test case similarity directed deep fuzzing

Authors:
Dazhi Zhang;Donggang Liu;Yu Lei;David Kung;Christoph Csallner;Nathaniel Nystrom;Wenhua Wang
Affiliations:
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States;Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States;Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States;Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States;Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States;Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States;Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States
Venue:
Journal of Systems and Software
Year:
2012

Citing 18
Cited 0

An empirical study of the reliability of UNIX utilities

Communications of the ACM
CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs

CC '02 Proceedings of the 11th International Conference on Compiler Construction
ITS4: A static vulnerability scanner for C and C++ code

ACSAC '00 Proceedings of the 16th Annual Computer Security Applications Conference
Testing static analysis tools using exploitable buffer overflows from open source code

Proceedings of the 12th ACM SIGSOFT twelfth international symposium on Foundations of software engineering
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
DART: directed automated random testing

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
CUTE: a concolic unit testing engine for C

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
EXE: automatically generating inputs of death

Proceedings of the 13th ACM conference on Computer and communications security
Compositional dynamic test generation

Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Hybrid Concolic Testing

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Address obfuscation: an efficient approach to combat a board range of memory error exploits

SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12
StackGuard: automatic adaptive detection and prevention of buffer-overflow attacks

SSYM'98 Proceedings of the 7th conference on USENIX Security Symposium - Volume 7
Detecting buffer overflow via automatic test input data generation

Computers and Operations Research
Grammar-based whitebox fuzzing

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Approximating edit distance in near-linear time

Proceedings of the forty-first annual ACM symposium on Theory of computing
Taint-based directed whitebox fuzzing

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Loop-extended symbolic execution on binary programs

Proceedings of the eighteenth international symposium on Software testing and analysis
KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fuzzing is widely used to detect software vulnerabilities. Blackbox fuzzing does not require program source code. It mutates well-formed inputs to produce new ones. However, these new inputs usually do not exercise deep program semantics since the possibility that they can satisfy the conditions of a deep program state is low. As a result, blackbox fuzzing is often limited to identify vulnerabilities in input validation components of a program. Domain knowledge such as input specifications can be used to mitigate these limitations. However, it is often expensive to obtain such knowledge in practice. Whitebox fuzzing employs heavy analysis techniques, i.e., dynamic symbolic execution, to systematically generate test inputs and explore as many paths as possible. It is powerful to explore new program branches so as to identify more vulnerabilities. However, it has fundamental challenges such as unsolvable constraints and is difficult to scale to large programs due to path explosion. This paper proposes a novel fuzzing approach that aims to produce test inputs to explore deep program semantics effectively and efficiently. The fuzzing process comprises two stages. At the first stage, a traditional blackbox fuzzing approach is applied for test data generation. This process is guided by a novel test case similarity metric. At the second stage, a subset of the test inputs generated at the first stage is selected based on the test case similarity metric. Then, combination testing is applied on these selected test inputs to further generate new inputs. As a result, less redundant test inputs, i.e., inputs that just explore shallow program paths, are created at the first stage, and more distinct test inputs, i.e., inputs that explore deep program paths, are produced at the second stage. A prototype tool SimFuzz is developed and evaluated on real programs, and the experimental results are promising.