On minimum distribution discrepancy support vector machine for domain adaptation

  • Authors:
  • Jianwen Tao;Fu-Lai Chung;Shitong Wang

  • Affiliations:
  • School of Digital Media, Jiangnan University, Wuxi, Jiangsu, China;Department of Computing, Hong Kong Polytechnic University, Hong Kong, China;School of Digital Media, Jiangnan University, Wuxi, Jiangsu, China and Department of Computing, Hong Kong Polytechnic University, Hong Kong, China

  • Venue:
  • Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

Domain adaptation learning (DAL) is a novel and effective technique to address pattern classification problems where the prior information for training is unavailable or insufficient. Its effectiveness depends on the discrepancy between the two distributions that respectively generate the training data for the source domain and the testing data for the target domain. However, DAL may not work so well when only the distribution mean discrepancy between source and target domains is considered and minimized. In this paper, we first construct a generalized projected maximum distribution discrepancy (GPMDD) metric for DAL on reproducing kernel Hilbert space (RKHS) based domain distributions by simultaneously considering both the projected maximum distribution mean and the projected maximum distribution scatter discrepancy between the source and the target domain. In the sequel, based on both the structure risk and the GPMDD minimization principle, we propose a novel domain adaptation kernelized support vector machine (DAKSVM) with respect to the classical SVM, and its two extensions called LS-DAKSVM and @m-DAKSVM with respect to the least-square SVM and the v-SVM, respectively. Moreover, our theoretical analysis justified that the proposed GPMDD metric could effectively measure the consistency not only between the RKHS embedding domain distributions but also between the scatter information of source and target domains. Hence, the proposed methods are distinctive in that the more consistency between the scatter information of source and target domains can be achieved by tuning the kernel bandwidth, the better the convergence of GPMDD metric minimization is and thus improving the scalability and generalization capability of the proposed methods for DAL. Experimental results on artificial and real-world problems indicate that the performance of the proposed methods is superior to or at least comparable with existing benchmarking methods.