Active learning with multiple views

  • Authors:
  • Craig Knoblock;Ion Alexandru Muslea

  • Affiliations:
  • -;-

  • Venue:
  • Active learning with multiple views
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Labeling training data for machine learning algorithms is tedious, time consuming, and error prone. Consequently, it is of utmost importance to minimize the amount of labeled data that is required to learn a target concept. In the work presented here, I focus on reducing the need for labeled data in multi-view learning tasks. The key characteristic of multi-view learning tasks is that the target concept can be independently learned within different views (i.e., disjoint sets of features that are sufficient to learn the concept of interest). For instance, robot navigation is a 2-view learning task because a robot can learn to avoid obstacles based on either sonar or vision sensors. In my dissertation, I make three main contributions. First, I introduce Co-Testing, which is an active learning algorithm that exploits multiple views. Co-Testing is based on the idea of learning from mistakes. More precisely, it queries examples on which the views predict a different label: if two views disagree, one of them is guaranteed to make a mistake. In a variety of real-world domains, from information extraction to text classification and discourse tree parsing, Co-Testing outperforms existing active learners. Second, I show that existing multi-view learners can perform unreliably if the views are incompatible or correlated. To cope with this problem, I introduce a robust multi-view learner, Co-EMT, which interleaves semi-supervised and active multi-view learning. My empirical results show that Co-EMT outperforms existing multi-view learners on a wide variety of learning tasks. Third, I introduce a view validation algorithm that predicts whether or not two views are adequate for solving a new, unseen learning task. View validation uses information acquired while solving several exemplar learning tasks to train a classifier that discriminates between tasks for which the views are adequate and inadequate for multi-view learning. My experiments on wrapper induction and text classification show that view validation requires a modest amount of training data to make high accuracy predictions.