On Data and Algorithms: Understanding Inductive Performance

Authors:
Alexandros Kalousis;João Gama;Melanie Hilario
Affiliations:
University of Geneva, Computer Science Department, 24, rue du General Dufour, CH-1211 Geneva 4, Switzerland. kalousis@cui.unige.ch;LIACC, FEP—University of Porto, Rua Campo Alegre 823, 4150 Porto, Portugal. jgama@liacc.up.pt;University of Geneva, Computer Science Department, 24, rue du General Dufour, CH-1211 Geneva 4, Switzerland. hilario@cui.unige.ch
Venue:
Machine Learning
Year:
2004

Citing 18
Cited 19

Original Contribution: Stacked generalization

Neural Networks
C4.5: programs for machine learning

C4.5: programs for machine learning
Average case analysis of k-CNF and k-DNF learning algorithms

Proceedings of the workshop on Computational learning theory and natural learning systems (vol. 2) : intersections between theory and experiment: intersections between theory and experiment
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Recursive automatic algorithm selection for inductive learning

Recursive automatic algorithm selection for inductive learning
Error reduction through learning multiple descriptions

Machine Learning
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data

Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data
Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results

Machine Learning
Ranking with Predictive Clustering Trees

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Characterization of Classification Algorithms

EPIA '95 Proceedings of the 7th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Generalizing from Case studies: A Case Study

ML '92 Proceedings of the Ninth International Workshop on Machine Learning
Characterizing Model Erros and Differences

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Meta-Learning by Landmarking Various Learning Algorithms

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Experiments in Meta-level Learning with ILP

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Improved Dataset Characterisation for Meta-learning

DS '02 Proceedings of the 5th International Conference on Discovery Science
Tractable Average-Case Analysis of Naive Bayesian Classifiers

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)

Introduction to the Special Issue on Meta-Learning

Machine Learning
Minority report in fraud detection: classification of skewed data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Machine learning: a review of classification and combining techniques

Artificial Intelligence Review
Selective generation of training examples in active meta-learning

International Journal of Hybrid Intelligent Systems - HIS 2007
Predicting the Performance of Learning Algorithms Using Support Vector Machines as Meta-regressors

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Supervised Machine Learning: A Review of Classification Techniques

Proceedings of the 2007 conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies
Active Generation of Training Examples in Meta-Regression

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
An Analysis of Meta-learning Techniques for Ranking Clustering Algorithms Applied to Artificial Data

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Mining Rules for the Automatic Selection Process of Clustering Methods Applied to Cancer Gene Expression Data

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Active learning to support the generation of meta-examples

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Analysis of complexity indices for classification problems: Cancer gene expression data

Neurocomputing
Automatic recommendation of classification algorithms based on data set characteristics

Pattern Recognition
Combining Uncertainty Sampling methods for supporting the generation of meta-examples

Information Sciences: an International Journal
A comparative analysis of classification algorithms in data mining for accuracy, speed and robustness

Information Technology and Management
(Psycho-)analysis of benchmark experiments: A formal framework for investigating the relationship between data sets and learning algorithms

Computational Statistics & Data Analysis
A feature subset selection algorithm automatic recommendation method

Journal of Artificial Intelligence Research
A metric for unsupervised metalearning

Intelligent Data Analysis
Automatic selection of classification learning algorithms for data mining practitioners

Intelligent Data Analysis
A novel feature subset selection algorithm based on association rule mining

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address two symmetrical issues, the discovery of similarities among classification algorithms, and among datasets. Both on the basis of error measures, which we use to define the error correlation between two algorithms, and determine the relative performance of a list of algorithms. We use the first to discover similarities between learners, and both of them to discover similarities between datasets. The latter sketch maps on the dataset space. Regions within each map exhibit specific patterns of error correlation or relative performance. To acquire an understanding of the factors determining these regions we describe them using simple characteristics of the datasets. Descriptions of each region are given in terms of the distributions of dataset characteristics within it.