Learning from examples with unspecified attribute values

  • Authors:
  • Sally A. Goldman;Stephen S. Kwek;Stephen D. Scott

  • Affiliations:
  • Department of Computer Science, Washington University, St. Louis, MO;Division of Computer Science, University of Texas at San Antonio, San Antonio, TX;Department of Computer Science and Engineering, University of Nebraska, Lincoln, NE

  • Venue:
  • Information and Computation
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

A challenging problem within machine learning is how to make good inferences from data sets in which pieces of information are missing. While it is valuable to have algorithms that perform well for specific domains, to gain a fundamental understanding of the problem, one needs a "theory" about how to learn with incomplete data. The important contribution of such a theory is not so much the specific algorithmic results, but rather that it provides good ways of thinking about the problem formally. In this paper we introduce the unspecified attribute value (UAV) learning model as a first step towards a theoretical framework for studying the problem of learning from incomplete data in the exact learning framework.In the UAV learning model, an example x is classified positive (resp., negative) if all possible assignments for the unspecified attributes result in a positive (resp., negative) classification. Otherwise the classification given to x is "?"(for unknown). Given an example x in which some attributes are unspecified, the oracle UAV-MQ responds with the classification of x. Given a hypothesis h, the oracle UAV-EQ returns an example x (that could have unspecified attributes) for which h(x) is incorrect.We show that any class of functions learnable in Angluin's exact model using the MQ and EQ oracles is also learnable in the UAV model using the MQ and UAV-EQ oracles as long as the counterexamples provided by the UAV-EQ oracle have a logarithmic number of unspecified attributes. We also show that any class learnable in the exact model using the MQ and EQ oracles is also learnable in the UAV model using the UAV-MQ and UAV-EQ oracles as well as an oracle to evaluate a given boolean formula on an example with unspecified attributes. (For some hypothesis classes such as decision trees and unate formulas the evaluation can be done in polynomial time without an oracle.) We also study the learnability of a universal class of decision trees under the UAV model and of DNF formulas under a representation-dependent variation of the UAV model.