Nearest-neighbor classification using unlabeled data for real world image application

  • Authors:
  • Shuhui Wang;Qingming Huang;Shuqiang Jiang;Qi Tian

  • Affiliations:
  • Institute of Computing Techonology, Chinese Academy of Sciences, Beijing, China;Graduate University, Chinese Academy of Sciences, Beijing, China;Institute of Computing Techonology, Chinese Academy of Sciences, Beijing, China;University of Texas at San Antonio, San Antonio, TX, USA

  • Venue:
  • Proceedings of the international conference on Multimedia
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Currently, Nearest-Neighbor approaches (NN) have been widely applied to real world image data mining. These approaches have the following three disadvantages: (i) the performance is inferior on small datasets; (ii) the performance of approximated nearest neighbor search will degrade for data with high dimensions; (iii) they are heavily dependent on the chosen feature and distance measure. To overcome these intrinsic weaknesses, we propose a novel Nearest-Neighbor method, which improves the original NN approaches from three aspects. Firstly, we propose a novel neighborhood similarity measure, where the similarity between test images and labeled images in the database is calculated jointly by the original image-to-image similarity and the average similarity of their neighboring unlabeled data. Secondly, we adopt the kernelized locality sensitive hashing to effectively conduct the nearest neighbor search for high dimensional data. Finally, to enhance the robustness of the method on different genres of images, we propose to fuse the discrimination power of different features by considering all the retrieved nearest neighbors via hashing systems using different features/kernels. Experimental result shows the advantage over traditional Nearest-Neighbor methods using the labeled data only. Even when the ratio of labeled data is very small, our method could also achieve remarkable results, thanks to the help of unlabeled data and multiple features.