Validating the result of a Quantified Boolean Formula (QBF) solver: theory and practice
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
The 3rd international planning competition: results and analysis
Journal of Artificial Intelligence Research
Ranking and Reputation Systems in the QBF Competition
AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Careful ranking of multiple solvers with timeouts and ties
SAT'11 Proceedings of the 14th international conference on Theory and application of satisfiability testing
Statistical methodology for comparison of SAT solvers
SAT'10 Proceedings of the 13th international conference on Theory and Applications of Satisfiability Testing
Hi-index | 0.00 |
The automated reasoning research community has grown accustomed to competitive events where a pool of systems is run on a pool of problem instances with the purpose of ranking the systems according to their performances. At the heart of such ranking lies the method used to score the systems, i.e., the procedure used to compute a numerical quantity that should summarize the performances of a system with respect to the other systems and to the pool of problem instances. In this paper we evaluate several scoring methods, including methods used in automated reasoning contests, as well as methods based on voting theory, and a new method that we introduce. Our research aims to establish which of the above methods maximizes the effectiveness measures that we devised to quantify desirable properties of the scoring procedures. Our method is empirical, in that we compare the scoring methods by computing the effectiveness measures using the data from the 2005 comparative evaluation of solvers for quantified Boolean formulas. The results of our experiments give useful indications about the relative strengths and weaknesses of the scoring methods, and allow us to infer also some conclusions that are independent of the specific method adopted.