A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets

Authors:
Der-Chiang Li;Chiao-Wen Liu;Susan C. Hu
Affiliations:
Department of Industrial and Information Management, National Cheng Kung University, 1, University Road, Tainan 70101, Taiwan;Department of Industrial and Information Management, National Cheng Kung University, 1, University Road, Tainan 70101, Taiwan;Department of Public Health, College of Medicine, National Cheng Kung University, 1, University Road, Tainan 70101, Taiwan
Venue:
Artificial Intelligence in Medicine
Year:
2011

Citing 21
Cited 3

Feature Selection: Evaluation, Application, and Small Sample Performance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Data mining methods for knowledge discovery

Data mining methods for knowledge discovery
Kernel PCA and de-noising in feature spaces

Proceedings of the 1998 conference on Advances in neural information processing systems II
A fuzzy c-means variant for the generation of fuzzy term sets

Fuzzy Sets and Systems - Theme: Modeling and learning
Kernel independent component analysis

The Journal of Machine Learning Research
An introduction to variable and feature selection

The Journal of Machine Learning Research
Feature extraction by non parametric mutual information maximization

The Journal of Machine Learning Research
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
Breast cancer diagnosis using genetic programming generated feature

Pattern Recognition
A new method to help diagnose cancers for small sample size

Expert Systems with Applications: An International Journal
Feature set decomposition for decision trees

Intelligent Data Analysis
Selection of relevant genes in cancer diagnosis based on their prediction accuracy

Artificial Intelligence in Medicine
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Similarity classifier in diagnosis of bladder cancer

Computer Methods and Programs in Biomedicine
Genetic algorithm-based feature set partitioning for classification problems

Pattern Recognition
Medical data mining by fuzzy modeling with selected features

Artificial Intelligence in Medicine
Predicting breast cancer survivability: a comparison of three data mining methods

Artificial Intelligence in Medicine
Overview and recent advances in partial least squares

SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets

Artificial Intelligence in Medicine

An efficient diagnosis system for detection of Parkinson's disease using fuzzy k-nearest neighbor approach

Expert Systems with Applications: An International Journal
Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection

Expert Systems with Applications: An International Journal
A new hybrid intelligent system for accurate detection of Parkinson's disease

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.02

Visualization

Abstract

Objective: Medical data sets are usually small and have very high dimensionality. Too many attributes will make the analysis less efficient and will not necessarily increase accuracy, while too few data will decrease the modeling stability. Consequently, the main objective of this study is to extract the optimal subset of features to increase analytical performance when the data set is small. Methods: This paper proposes a fuzzy-based non-linear transformation method to extend classification related information from the original data attribute values for a small data set. Based on the new transformed data set, this study applies principal component analysis (PCA) to extract the optimal subset of features. Finally, we use the transformed data with these optimal features as the input data for a learning tool, a support vector machine (SVM). Six medical data sets: Pima Indians' diabetes, Wisconsin diagnostic breast cancer, Parkinson disease, echocardiogram, BUPA liver disorders dataset, and bladder cancer cases in Taiwan, are employed to illustrate the approach presented in this paper. Results: This research uses the t-test to evaluate the classification accuracy for a single data set; and uses the Friedman test to show the proposed method is better than other methods over the multiple data sets. The experiment results indicate that the proposed method has better classification performance than either PCA or kernel principal component analysis (KPCA) when the data set is small, and suggest creating new purpose-related information to improve the analysis performance. Conclusion: This paper has shown that feature extraction is important as a function of feature selection for efficient data analysis. When the data set is small, using the fuzzy-based transformation method presented in this work to increase the information available produces better results than the PCA and KPCA approaches.