Empirical Evaluation of Scoring Methods

  • Authors:
  • Luca Pulina

  • Affiliations:
  • Laboratory of Systems and Technologies for Automated Reasoning (STAR-Lab), DIST, Università di Genova, Viale Causa, 13 --16145 Genova, Italy, pulina@dist.unige.it

  • Venue:
  • Proceedings of the 2006 conference on STAIRS 2006: Proceedings of the Third Starting AI Researchers' Symposium
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The automated reasoning research community has grown accustomed to competitive events where a pool of systems is run on a pool of problem instances with the purpose of ranking the systems according to their performances. At the heart of such ranking lies the method used to score the systems, i.e., the procedure used to compute a numerical quantity that should summarize the performances of a system with respect to the other systems and to the pool of problem instances. In this paper we evaluate several scoring methods, including methods used in automated reasoning contests, as well as methods based on voting theory, and a new method that we introduce. Our research aims to establish which of the above methods maximizes the effectiveness measures that we devised to quantify desirable properties of the scoring procedures. Our method is empirical, in that we compare the scoring methods by computing the effectiveness measures using the data from the 2005 comparative evaluation of solvers for quantified Boolean formulas. The results of our experiments give useful indications about the relative strengths and weaknesses of the scoring methods, and allow us to infer also some conclusions that are independent of the specific method adopted.