Double-bootstrapping source data selection for instance-based transfer learning

  • Authors:
  • Di Lin;Xing An;Jian Zhang

  • Affiliations:
  • -;-;-

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2013

Quantified Score

Hi-index 0.10

Visualization

Abstract

Instance-based transfer is an important paradigm for transfer learning, where data from related tasks (source data) are combined with the data for the current learning task (target data) to train a learner for the current (target) task. However, in most application scenarios, the benefit of the source data is unclear. The source may contain both helpful and harmful instances to the target learning. Simply combining the source with the target data may result in performance deterioration (negative transfer). Selecting the instances from the source data that will benefit the target task is a key step for instance-based transfer learning. Most existing instance-based transfer methods lack such selection or mix source selection with the training for the target task. This leads to problems as the training may use source data harmful to the target. We propose a simple yet effective method for instance-based transfer learning in environments where the usefulness of the sources are unclear. The method employs a double-selection process, based on bootstrapping, to reduce the impact of irrelevant/harmful data in the source. Experiment results show that in most cases, our method produces more improvements through transfer than TrBagg (Kamishima et al., 2009) and TrAdaBoost (Dai et al., 2009). Our method can also deal with a wider range of transfer learning scenarios.