On the stratification of multi-label data

  • Authors:
  • Konstantinos Sechidis;Grigorios Tsoumakas;Ioannis Vlahavas

  • Affiliations:
  • Dept. of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Dept. of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Dept. of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece

  • Venue:
  • ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Stratified sampling is a sampling method that takes into account the existence of disjoint groups within a population and produces samples where the proportion of these groups is maintained. In single-label classification tasks, groups are differentiated based on the value of the target variable. In multi-label learning tasks, however, where there are multiple target variables, it is not clear how stratified sampling could/should be performed. This paper investigates stratification in the multi-label data context. It considers two stratification methods for multi-label data and empirically compares them along with random sampling on a number of datasets and based on a number of evaluation criteria. The results reveal some interesting conclusions with respect to the utility of each method for particular types of multi-label datasets.