On sampling strategies for small and continuous data with the modeling of genetic programming and adaptive neuro-fuzzy inference system

Authors:
S. Sen;E. A. Sezer;C. Gokceoglu;S. Yagiz
Affiliations:
Department of Computer Engineering, Hacettepe University, Ankara, Turkey;Department of Computer Engineering, Hacettepe University, Ankara, Turkey;Department of Geological Engineering, Hacettepe University, Ankara, Turkey;Department of Geological Engineering, Pamukkale University, Denizli, Turkey
Venue:
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - FUZZYSS'2011: 2nd International Fuzzy Systems Symposium
Year:
2012

Citing 24
Cited 0

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
The nature of statistical learning theory

The nature of statistical learning theory
Genetic programming: an introduction: on the automatic evolution of computer programs and its applications

Genetic programming: an introduction: on the automatic evolution of computer programs and its applications
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Fuzzy Models and Algorithms for Pattern Recognition and Image Processing

Fuzzy Models and Algorithms for Pattern Recognition and Image Processing
Artificial Intelligence: A Guide to Intelligent Systems

Artificial Intelligence: A Guide to Intelligent Systems
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Synthetic Fraud Data Generation Methodology

ICICS '02 Proceedings of the 4th International Conference on Information and Communications Security
Introduction to Evolutionary Computing

Introduction to Evolutionary Computing
Synthesizing Test Data for Fraud Detection Systems

ACSAC '03 Proceedings of the 19th Annual Computer Security Applications Conference
Credit scoring with a data mining approach based on support vector machines

Expert Systems with Applications: An International Journal
Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems

Expert Systems with Applications: An International Journal
Classification of weld flaws with imbalanced class data

Expert Systems with Applications: An International Journal
Cluster-based under-sampling approaches for imbalanced data distributions

Expert Systems with Applications: An International Journal
Evolutionary software engineering, a review

Applied Soft Computing
Predicting breast cancer survivability: a comparison of three data mining methods

Artificial Intelligence in Medicine
Application of fuzzy inference system and nonlinear regression models for predicting rock brittleness

Expert Systems with Applications: An International Journal
Random Forests for Generating Partially Synthetic, Categorical Data

Transactions on Data Privacy
Open issues in genetic programming

Genetic Programming and Evolvable Machines
Using support vector machines for generating synthetic datasets

PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
Synthetic data for small area estimation

PSD'10 Proceedings of the 2010 international conference on Privacy in statistical databases
Uncertainty analysis for the forecast of lake level fluctuations using ensembles of ANN and ANFIS models

Expert Systems with Applications: An International Journal
A new approach to prediction of radiotherapy of bladder cancer cells in small dataset analysis

Expert Systems with Applications: An International Journal
Synthetic Generation of High-Dimensional Datasets

IEEE Transactions on Visualization and Computer Graphics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sampling strategies which have very significant role on examining data characteristics i.e. imbalanced, small, exhaustive have been discussed in the literature for the last couple decades. In this study, the sampling problem encountered on small and continuous data sets is examined. Sampling with measured data by employing k-fold cross validation, and sampling with synthetic data generated by fuzzy c-means clustering are applied, and then the performances of genetic programming GP and adaptive neuro fuzzy inference system ANFIS on these data sets are discussed. Concluding remarks are that when the experimental results are considered, fuzzy c-means based synthetic sampling is more successful than k-fold cross validation while modeling small and continous data sets with ANFIS and GP, so it can be proposed for these type of data sets. Additionally, ANFIS shows slightly better performance than GP when sytnthetic data is employed, but GP is less sensitive to data set and produces ouputs that are narrower range than ANFIS's outputs while k-fold cross validation is employed.