Complexity bounds for batch active learning in classification

  • Authors:
  • Philippe Rolet;Olivier Teytaud

  • Affiliations:
  • Inria, LRI, UMR, CNRS, Université Paris-Sud, Orsay Cedex, France;Inria, LRI, UMR, CNRS, Université Paris-Sud, Orsay Cedex, France

  • Venue:
  • ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Active learning [1] is a branch of Machine Learning in which the learning algorithm, instead of being directly provided with pairs of problem instances and their solutions (their labels), is allowed to choose, from a set of unlabeled data, which instances to query. It is suited to settings where labeling instances is costly. This paper analyzes the speed-up of batch (parallel) active learning compared to sequential active learning (where instances are chosen 1 by 1): how faster can an algorithm become if it can query λ instances at once?. There are two main contributions: proving lower and upper bounds on the possible gain, and illustrating them by experimenting on usual active learning algorithms. Roughly speaking, the speed-up is asymptotically logarithmic in the batch size λ (i.e. when λ → ∞). However, for some classes of functions with finite VC-dimension V, a linear speed-up can be achieved until a batch size of V. Practically speaking, this means that parallelizing computations on an expensive-to-label problem which is suited to active learning is very beneficial until V simultaneous queries, and less interesting (yet still bringing improvement) afterwards.