Column subset selection for active learning in image classification

  • Authors:
  • Jianfeng Shen;Bin Ju;Tao Jiang;Jingjing Ren;Miao Zheng;Chengwei Yao;Lanjuan Li

  • Affiliations:
  • The First Affiliated Hospital of College of Medical School, Zhejiang University, China and Zhejiang Medical College, China and Zhejiang Province Health Bureau Center of Information, China;Zhejiang Province Health Bureau Center of Information, China;Zhejiang Province Health Bureau Center of Information, China;The First Affiliated Hospital of College of Medical School, Zhejiang University, China;College of Computer Science and Technology, Zhejiang University, China;College of Computer Science and Technology, Zhejiang University, China;State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, No. 216, Qingchun Road, Hangzhou, Zhejiang 310006, China

  • Venue:
  • Neurocomputing
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Image classification is an important task in computer vision and machine learning. However, it is known that manually labeling images is time-consuming and expensive, but the unlabeled images are easily available. Active learning is a mechanism which tries to determine which unlabeled data points would be the most informative (i.e., improve the classifier the most) if they are labeled and used as training samples. In this paper, we introduce the idea of column subset selection, which aims to select the most representation columns from a data matrix, into active learning and propose a novel active learning algorithm, column subset selection for active learning (CSS"a"c"t"i"v"e). CSS"a"c"t"i"v"e selects the most representative images to label, then the other images are reconstructed by these labeled images. The goal of CSS"a"c"t"i"v"e is to minimize the reconstruction error. Besides, most of the previous active learning approaches are based on linear model, and hence they only consider linear functions. Therefore, they fail to discover the intrinsic geometry in images when the image space is highly nonlinear. Therefore, we provide a kernel-based column subset selection for active learning (KCSS"a"c"t"i"v"e) algorithm which performs the active learning in Reproducing Kernel Hilbert Space (RKHS) instead of the original image space to address this problem. Experimental results on Yale, AT&T and COIL20 data sets demonstrate the effectiveness of our proposed approaches.