On the stratification of multi-label data

Authors:
Konstantinos Sechidis;Grigorios Tsoumakas;Ioannis Vlahavas
Affiliations:
Dept. of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Dept. of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Dept. of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Year:
2011

Citing 16
Cited 1

Random Forests

Machine Learning
Machine Learning as an Experimental Science

Machine Learning
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization

IEEE Transactions on Knowledge and Data Engineering
The challenge problem for automated detection of 101 semantic concepts in multimedia

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Multilabel classification via calibrated label ranking

Machine Learning
Decision trees for hierarchical multi-label classification

Machine Learning
Multi-label Classification Using Ensembles of Pruned Sets

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
Combining instance-based learning and logistic regression for multilabel classification

Machine Learning
Classifier Chains for Multi-label Classification

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Performance measures for multilabel evaluation: a case study in the area of image classification

Proceedings of the international conference on Multimedia information retrieval
Protein classification with multiple algorithms

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics

Multi-label lego -- enhancing multi-label classifiers with local patterns

IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stratified sampling is a sampling method that takes into account the existence of disjoint groups within a population and produces samples where the proportion of these groups is maintained. In single-label classification tasks, groups are differentiated based on the value of the target variable. In multi-label learning tasks, however, where there are multiple target variables, it is not clear how stratified sampling could/should be performed. This paper investigates stratification in the multi-label data context. It considers two stratification methods for multi-label data and empirically compares them along with random sampling on a number of datasets and based on a number of evaluation criteria. The results reveal some interesting conclusions with respect to the utility of each method for particular types of multi-label datasets.