Information Retrieval
Using benchmarking to advance research: a challenge to software engineering
Proceedings of the 25th International Conference on Software Engineering
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Testing static analysis tools using exploitable buffer overflows from open source code
Proceedings of the 12th ACM SIGSOFT twelfth international symposium on Foundations of software engineering
Parfait: designing a scalable bug checker
Proceedings of the 2008 workshop on Static analysis
Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
Practical and effective symbolic analysis for buffer overflow detection
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Static deep error checking in large system applications using parfait
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Hi-index | 0.00 |
Benchmarks for bug detection tools are still in their infancy. Though in recent years various tools and techniques were introduced, little effort has been spent on creating a benchmark suite and a harness for a consistent quantitative and qualitative performance measurement. For assessing the performance of a bug detection tool and determining which tool is better than another for the type of code to be looked at, the following questions arise: 1) how many bugs are correctly found, 2) what is the tool's average false positive rate, 3) how many bugs are missed by the tool altogether, and 4) does the tool scale. In this paper we present our contribution to the C bug detection community: two benchmark suites that allow developers and users to evaluate accuracy and scalability of a given tool. The two suites contain buggy, mature open source code; bugs are representative of "real world" bugs. A harness accompanies each benchmark suite to compute automatically qualitative and quantitative performance of a bug detection tool. BegBunch has been tested to run on the Solaris™, Mac OS X and Linux operating systems. We show the generality of the harness by evaluating it with our own Parfait and three publicly available bug detection tools developed by others.