The data complexity index to construct an efficient cross-validation method

Authors:
Der-Chiang Li;Yao-Hwei Fang;Y.M. Frank Fang
Affiliations:
Department of Industrial and Information Management National Cheng Kung University, Taiwan;Division of Biostatistics and Bioinformatics, National Health Research Institutes, Taiwan;Geographic Information System Research Center, Feng Chia University, Taiwan
Venue:
Decision Support Systems
Year:
2010

Citing 14
Cited 2

Technical Note: Selecting a Classification Method by Cross-Validation

Machine Learning
A classification approach using multi-layered neural networks

Decision Support Systems - Special issue on neural networks for decision support
The nature of statistical learning theory

The nature of statistical learning theory
Neural network design

Neural network design
Modified support vector novelty detector using training data with outliers

Pattern Recognition Letters
On Classifier Domains of Competence

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Using the revised EM algorithm to remove noisy data for improving the one-against-the-rest method in binary text classification

Information Processing and Management: an International Journal
An algorithm to cluster data for efficient classification of support vector machines

Expert Systems with Applications: An International Journal
Modeling consumer situational choice of long distance communication with neural networks

Decision Support Systems
Classification of Unbalanced Medical Data with Weighted Regularized Least Squares

FBIT '07 Proceedings of the 2007 Frontiers in the Convergence of Bioscience and Information Technologies
A non-linearly virtual sample generation technique using group discovery and parametric equations of hypersphere

Expert Systems with Applications: An International Journal
Application of a noisy data classification technique to determine the occurrence of flashover in compartment fires

Advanced Engineering Informatics

Using structure-based data transformation method to improve prediction accuracies for small data sets

Decision Support Systems
Improving learning accuracy by using synthetic samples for small datasets with non-linear attribute dependency

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cross-validation is a widely used model evaluation method in data mining applications. However, it usually takes a lot of effort to determine the appropriate parameter values, such as training data size and the number of experiment runs, to implement a validated evaluation. This study develops an efficient cross-validation method called Complexity-based Efficient (CBE) cross-validation for binary classification problems. CBE cross-validation establishes a complexity index, called the CBE index, by exploring the geometric structure and noise of data. The CBE index is used to calculate the optimal training data size and the number of experiment runs to reduce model evaluation time when dealing with computationally expensive classification data sets. A simulated and three real data sets are employed to validate the performance of the proposed method in the study, while the validation methods compared are repeated random sub-sampling validation and K-fold cross-validation. The results show that CBE cross-validation, repeated random sub-sampling validation and K-fold cross-validation have similar validation performance, except that the training time required for CBE cross-validation is indeed lower than that for the other two methods.