Machine Learning - Special issue on inductive transfer
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
Collective multi-label classification
Proceedings of the 14th ACM international conference on Information and knowledge management
An Introduction to Copulas (Springer Series in Statistics)
An Introduction to Copulas (Springer Series in Statistics)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Maximum likelihood rule ensembles
Proceedings of the 25th international conference on Machine learning
Random k-Labelsets: An Ensemble Method for Multilabel Classification
ECML '07 Proceedings of the 18th European conference on Machine Learning
Aggregating independent and dependent models to learn multi-label classifiers
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Multilabel classification with principal label space transformation
Neural Computation
Dependent binary relevance models for multi-label classification
Pattern Recognition
Hi-index | 0.00 |
In multi-label classification (MLC), each instance is associated with a subset of labels instead of a single class, as in conventional classification, and this generalization enables the definition of a multitude of loss functions. Indeed, a large number of losses has already been proposed and is commonly applied as performance metrics in experimental studies. However, even though these loss functions are of a quite different nature, a concrete connection between the type of multi-label classifier used and the loss to be minimized is rarely established, implicitly giving the misleading impression that the same method can be optimal for different loss functions. In this paper, we elaborate on risk minimization and the connection between loss functions in MLC, both theoretically and empirically. In particular, we compare two important loss functions, namely the Hamming loss and the subset 0/1 loss. We perform a regret analysis, showing how poor a classifier intended to minimize the subset 0/1 loss can become in terms of Hamming loss and vice versa. The theoretical results are corroborated by experimental studies, and their implications for MLC methods are discussed in a broader context.