MCut: a thresholding strategy for multi-label classification

  • Authors:
  • Christine Largeron;Christophe Moulin;Mathias Géry

  • Affiliations:
  • Université de Lyon, Saint-Étienne, France,Laboratoire Hubert Curien, CNRS UMR 5516, France,Université de Saint-Étienne, France;Université de Lyon, Saint-Étienne, France,Laboratoire Hubert Curien, CNRS UMR 5516, France,Université de Saint-Étienne, France;Université de Lyon, Saint-Étienne, France,Laboratoire Hubert Curien, CNRS UMR 5516, France,Université de Saint-Étienne, France

  • Venue:
  • IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The multi-label classification is a frequent task in machine learning notably in text categorization. When binary classifiers are not suited, an alternative consists in using a multiclass classifier that provides for each document a score per category and then in applying a thresholding strategy in order to select the set of categories which must be assigned to the document. The common thresholding strategies, such as RCut, PCut and SCut methods, need a training step to determine the value of the threshold. To overcome this limit, we propose a new strategy, called MCut which automatically estimates a value for the threshold. This method does not have to be trained and does not need any parametrization. Experiments performed on two textual corpora, XML Mining 2009 and RCV1 collections, show that the MCut strategy results are on par with the state of the art but MCut is easy to implement and parameter free.