On Data and Algorithms: Understanding Inductive Performance

  • Authors:
  • Alexandros Kalousis;João Gama;Melanie Hilario

  • Affiliations:
  • University of Geneva, Computer Science Department, 24, rue du General Dufour, CH-1211 Geneva 4, Switzerland. kalousis@cui.unige.ch;LIACC, FEP—University of Porto, Rua Campo Alegre 823, 4150 Porto, Portugal. jgama@liacc.up.pt;University of Geneva, Computer Science Department, 24, rue du General Dufour, CH-1211 Geneva 4, Switzerland. hilario@cui.unige.ch

  • Venue:
  • Machine Learning
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we address two symmetrical issues, the discovery of similarities among classification algorithms, and among datasets. Both on the basis of error measures, which we use to define the error correlation between two algorithms, and determine the relative performance of a list of algorithms. We use the first to discover similarities between learners, and both of them to discover similarities between datasets. The latter sketch maps on the dataset space. Regions within each map exhibit specific patterns of error correlation or relative performance. To acquire an understanding of the factors determining these regions we describe them using simple characteristics of the datasets. Descriptions of each region are given in terms of the distributions of dataset characteristics within it.