An extensive experimental comparison of methods for multi-label learning

  • Authors:
  • Gjorgji Madjarov;Dragi Kocev;Dejan Gjorgjevikj;SašO Deroski

  • Affiliations:
  • Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Rugjer Boshkovikj 16, 1000 Skopje, Macedonia and Department of Knowledge Technologies, Joef Stefan Institute, Jamov ...;Department of Knowledge Technologies, Joef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia;Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Rugjer Boshkovikj 16, 1000 Skopje, Macedonia;Department of Knowledge Technologies, Joef Stefan Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia

  • Venue:
  • Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

Multi-label learning has received significant attention in the research community over the past few years: this has resulted in the development of a variety of multi-label learning methods. In this paper, we present an extensive experimental comparison of 12 multi-label learning methods using 16 evaluation measures over 11 benchmark datasets. We selected the competing methods based on their previous usage by the community, the representation of different groups of methods and the variety of basic underlying machine learning methods. Similarly, we selected the evaluation measures to be able to assess the behavior of the methods from a variety of view-points. In order to make conclusions independent from the application domain, we use 11 datasets from different domains. Furthermore, we compare the methods by their efficiency in terms of time needed to learn a classifier and time needed to produce a prediction for an unseen example. We analyze the results from the experiments using Friedman and Nemenyi tests for assessing the statistical significance of differences in performance. The results of the analysis show that for multi-label classification the best performing methods overall are random forests of predictive clustering trees (RF-PCT) and hierarchy of multi-label classifiers (HOMER), followed by binary relevance (BR) and classifier chains (CC). Furthermore, RF-PCT exhibited the best performance according to all measures for multi-label ranking. The recommendation from this study is that when new methods for multi-label learning are proposed, they should be compared to RF-PCT and HOMER using multiple evaluation measures.