Mining the hidden structure of inductive learning data sets

Authors:
Nathalie Japkowicz
Affiliations:
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Ontario, Canada
Venue:
Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Year:
2012

Citing 6
Cited 0

Generalizing from case studies: a case study

ML92 Proceedings of the ninth international workshop on Machine learning
The class imbalance problem: A systematic study

Intelligent Data Analysis
Cross-disciplinary perspectives on meta-learning for algorithm selection

ACM Computing Surveys (CSUR)
Visualizing Classifier Performance on Different Domains

ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Data Mining: Practical Machine Learning Tools and Techniques

Data Mining: Practical Machine Learning Tools and Techniques
Balancing strategies and class overlapping

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a method for extracting the hidden characteristics of machine learning domains. It does so by evaluating the performance of various classifiers on these domains as well as on artificial data whose characteristics are visible since they were purposely included in the generation process. The results obtained on both the real and artificial data are analyzed simultaneously using a classical visualization tool for hierarchical clustering called a dendogram. The idea is to map the real-world domains to the artificial ones according to how well they are learnt by a variety of classifiers and, through this relationship, extract their characteristics. The method is able to determine how difficult it is to classify a specific domain and whether this difficulty stems from the complexity of the concept it embodies, the amount of overlap between each class, the dearth of training data or its dimensionality. This is an important contribution as it allows researchers to understand the underlying nature of their data, and, thus converge quickly toward novel, well-adapted solutions to their particular problems.