SimFuzz: Test case similarity directed deep fuzzing

  • Authors:
  • Dazhi Zhang;Donggang Liu;Yu Lei;David Kung;Christoph Csallner;Nathaniel Nystrom;Wenhua Wang

  • Affiliations:
  • Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States;Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States;Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States;Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States;Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States;Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States;Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, United States

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fuzzing is widely used to detect software vulnerabilities. Blackbox fuzzing does not require program source code. It mutates well-formed inputs to produce new ones. However, these new inputs usually do not exercise deep program semantics since the possibility that they can satisfy the conditions of a deep program state is low. As a result, blackbox fuzzing is often limited to identify vulnerabilities in input validation components of a program. Domain knowledge such as input specifications can be used to mitigate these limitations. However, it is often expensive to obtain such knowledge in practice. Whitebox fuzzing employs heavy analysis techniques, i.e., dynamic symbolic execution, to systematically generate test inputs and explore as many paths as possible. It is powerful to explore new program branches so as to identify more vulnerabilities. However, it has fundamental challenges such as unsolvable constraints and is difficult to scale to large programs due to path explosion. This paper proposes a novel fuzzing approach that aims to produce test inputs to explore deep program semantics effectively and efficiently. The fuzzing process comprises two stages. At the first stage, a traditional blackbox fuzzing approach is applied for test data generation. This process is guided by a novel test case similarity metric. At the second stage, a subset of the test inputs generated at the first stage is selected based on the test case similarity metric. Then, combination testing is applied on these selected test inputs to further generate new inputs. As a result, less redundant test inputs, i.e., inputs that just explore shallow program paths, are created at the first stage, and more distinct test inputs, i.e., inputs that explore deep program paths, are produced at the second stage. A prototype tool SimFuzz is developed and evaluated on real programs, and the experimental results are promising.