Improving multi-label classification using semi-supervised learning and dimensionality reduction

  • Authors:
  • Eakasit Pacharawongsakda;Cholwich Nattee;Thanaruk Theeramunkong

  • Affiliations:
  • School of Information, Computer, and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Thailand;School of Information, Computer, and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Thailand;School of Information, Computer, and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Thailand

  • Venue:
  • PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multi-label classification has been increasingly recognized since it can assign multiple class labels to an object. This paper proposes a new method to solve simultaneously two major problems in multi-label classification; (1) requirement of sufficient labeled data for training and (2) high dimensionality in feature/label spaces. Towards the first issue, we extend semi-supervised learning to handle multi-label classification and then exploit unlabeled data with averagely high-confident tagged labels as additional training data. To solve the second issue, we present two alternative dimensionality-reduction approaches using Singular Value Decomposition (SVD). The first approach, namely LAbel Space Transformation for CO-training REgressor (LAST-CORE), reduces complexity in the label space while the second one namely Feature and LAbel Space Transformation for CO-training REgressor (FLAST-CORE), compress both label and feature spaces. For both approaches, the co-training regression method is used to predict the values in the lower-dimensional spaces and then the original space can be reconstructed using the orthogonal property of SVD with adaptive threshold setting. Additionally, we also introduce a method of parallel computation to fasten the co-training regression. By a set of experiments on three real world datasets, the results show that our semi-supervised learning methods gain better performance, compared to the method that uses only the labeled data. Moreover, for dimensionality reduction, the LAST-CORE approach tends to obtain better classification performance while the FLAST-CORE approach helps saving computational time.