Statistical Methods for Analyzing Speedup Learning Experiments
Machine Learning
Empirical methods for artificial intelligence
Empirical methods for artificial intelligence
Heavy-Tailed Phenomena in Satisfiability and Constraint Satisfaction Problems
Journal of Automated Reasoning
On SAT Instance Classes and a Method for Reliable Performance Experiments with SAT Solvers
Annals of Mathematics and Artificial Intelligence
Performance testing of combinatorial solvers with isomorph class instances
Proceedings of the 2007 workshop on Experimental computer science
Experimenting with Small Changes in Conflict-Driven Clause Learning Algorithms
CP '08 Proceedings of the 14th international conference on Principles and Practice of Constraint Programming
Empirical Evaluation of Scoring Methods
Proceedings of the 2006 conference on STAIRS 2006: Proceedings of the Third Starting AI Researchers' Symposium
Summarizing CSP hardness with continuous probability distributions
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Benchmarking SAT solvers for bounded model checking
SAT'05 Proceedings of the 8th international conference on Theory and Applications of Satisfiability Testing
Careful ranking of multiple solvers with timeouts and ties
SAT'11 Proceedings of the 14th international conference on Theory and application of satisfiability testing
Evaluating LTL satisfiability solvers
ATVA'11 Proceedings of the 9th international conference on Automated technology for verification and analysis
A survey of the satisfiability-problems solving algorithms
International Journal of Advanced Intelligence Paradigms
Hi-index | 0.00 |
Evaluating improvements to modern SAT solvers and comparison of two arbitrary solvers is a challenging and important task. Relative performance of two solvers is usually assessed by running them on a set of SAT instances and comparing the number of solved instances and their running time in a straightforward manner. In this paper we point to shortcomings of this approach and advocate more reliable, statistically founded methodologies that could discriminate better between good and bad ideas. We present one such methodology and illustrate its application.