A formal analysis of why heuristic functions work

  • Authors:
  • B. John Oommen;Luis G. Rueda

  • Affiliations:
  • Senior Member, IEEE. School of Computer Science, Carleton University, 1125 Colonel By Dr., Ottawa, ON, K1S 5B6, Canada;School of Computer Science, University of Windsor, 401 Sunset Ave., Windsor, ON, N9B 3P4, Canada

  • Venue:
  • Artificial Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many optimization problems in computer science have been proven to be NP-hard, and it is unlikely that polynomial-time algorithms that solve these problems exist unless P = NP. Alternatively, they are solved using heuristics algorithms, which provide a sub-optimal solution that, hopefully, is arbitrarily close to the optimal. Such problems are found in a wide range of applications, including artificial intelligence, game theory, graph partitioning, database query optimization, etc. Consider a heuristic algorithm, A. Suppose that A could invoke one of two possible heuristic functions. The question of determining which heuristic function is superior, has typically demanded a yes/no answer--one which is often substantiated by empirical evidence. In this paper, by using Pattern Classification Techniques (PCT), we propose a formal, rigorous theoretical model that provides a stochastic answer to this problem. We prove that given a heuristic algorithm, A, that could utilize either of two heuristic functions H1 or H2 used to find the solution to a particular problem, if the accuracy of evaluating the cost of the optimal solution by using H1 is greater than the accuracy of evaluating the cost using H2, then H1 has a higher probability than H2 of leading to the optimal solution. This unproven conjecture has been the basis for designing numerous algorithms such as the A* algorithm, and its variants. Apart from formally proving the result, we also address the corresponding database query optimization problem that has been open for at least two decades. To validate our proofs, we report empirical results on database query optimization techniques involving a few well-known histogram estimation methods.