S3MKL: scalable semi-supervised multiple kernel learning for image data mining

  • Authors:
  • Shuhui Wang;Shuqiang Jiang;Qingming Huang;Qi Tian

  • Affiliations:
  • Institute of Computing Techonology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Techonology, Chinese Academy of Sciences, Beijing, China;Graduate University, Chinese Academy of Sciences, Beijing, China;University of Texas at San Antonio, San Antonio, TX, USA

  • Venue:
  • Proceedings of the international conference on Multimedia
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

For large scale image data mining, a challenging problem is to design a method that could work efficiently under the situation of little ground-truth annotation and a mass of unlabeled or noisy data. As one of the major solutions, semi-supervised learning (SSL) has been deeply investigated and widely used in image classification, ranking and retrieval. However, most SSL approaches are not able to incorporate multiple information sources. Furthermore, no sample selection is done on unlabeled data, leading to the unpredictable risk brought by uncontrolled unlabeled data and heavy computational burden that is not suitable for learning on real world dataset. In this paper, we propose a scalable semi-supervised multiple kernel learning method (S3MKL) to deal with the first problem. Our method imposes group LASSO regularization on the kernel coefficients to avoid over-fitting and conditional expectation consensus for regularizing the behaviors of different kernel on the unlabeled data. To reduce the risk of using unlabeled data, we also design a hashing system where multiple kernel locality sensitive hashing (MKLSH) are constructed with respect to different kernels to identify a set of "informative" and "compact" unlabeled training subset from a large unlabeled data corpus. Combining S3MKL with MKLSH, the method is suitable for real world image classification and personalized web image re-ranking with very little user interaction. Comprehensive experiments are conducted to test the performance of our method, and the results show that our method provides promising powers for large scale real world image classification and retrieval.