An n-spheres based synthetic data generator for supervised classification

Authors:
Javier Sánchez-Monedero;Pedro Antonio Gutiérrez;María Pérez-Ortiz;César Hervás-Martínez
Affiliations:
Dept. of Computer Science and Numerical Analysis, University of Córdoba, Córdoba, Spain;Dept. of Computer Science and Numerical Analysis, University of Córdoba, Córdoba, Spain;Dept. of Computer Science and Numerical Analysis, University of Córdoba, Córdoba, Spain;Dept. of Computer Science and Numerical Analysis, University of Córdoba, Córdoba, Spain
Venue:
IWANN'13 Proceedings of the 12th international conference on Artificial Neural Networks: advances in computational intelligence - Volume Part I
Year:
2013

Citing 8
Cited 0

Complexity Measures of Supervised Classification Problems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Gaussian Processes for Ordinal Regression

The Journal of Machine Learning Research
Support Vector Ordinal Regression

Neural Computation
Ordinal extreme learning machine

Neurocomputing
A Bayes-true data generator for evaluation of supervised and unsupervised learning methods

Pattern Recognition Letters
An experimental study of different ordinal regression methods and measures

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Learner excellence biased by data set selection: A case for data characterisation and artificial data sets

Pattern Recognition
Exploitation of pairwise class distances for ordinal classification

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Synthetic datasets can be useful in a variety of situations, specifically when new machine learning models and training algorithms are developed or when trying to seek the weaknesses of an specific method. In contrast to real-world data, synthetic datasets provide a controlled environment for analysing concrete critic points such as outlier tolerance, data dimensionality influence and class imbalance, among others. In this paper, a framework for synthetic data generation is developed with special attention to pattern order in the space, data dimensionality, class overlapping and data multimodality. Variables such as position, width and overlapping of data distributions in the n-dimensional space are controlled by considering them as n-spheres. The method is tested in the context of ordinal regression, a paradigm of classification where there is an order arrangement between categories. The contribution of the paper is the full control over data topology and over a set of relevant statistical properties of the data.