An empirical analysis of under-sampling techniques to balance a protein structural class dataset

Authors:
Marcilio C. P. de Souto;Valnaide G. Bittencourt;Jose A. F. Costa
Affiliations:
Department of Informatics and Applied Mathematics, Federal University of Rio Grande do Norte, Natal-RN, Brazil;Department of Computing and Automation, Federal University of Rio Grande do Norte, Natal-RN, Brazil;Department of Electric Engineering, Federal University of Rio Grande do Norte, Natal-RN, Brazil
Venue:
ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Year:
2006

Citing 10
Cited 2

Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Bioinformatics: the machine learning approach

Bioinformatics: the machine learning approach
Machine Learning

Machine Learning
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Protein Folding Class Predictor for SCOP: Approach Based on Global Descriptors

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
Applying One-Sided Selection to Unbalanced Datasets

MICAI '00 Proceedings of the Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
The class imbalance problem: A systematic study

Intelligent Data Analysis

Consistency Measure of Multiple Classifiers for Land Cover Classification by Remote Sensing Image

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Inverse random under sampling for class imbalance problem and its application to multi-label classification

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

There have been a great deal of research on learning from imbalanced datasets. Among the widely used methods proposed to solve such a problem, the most common are based either on under or over sampling of the original dataset. In this work, we evaluate several methods of under-sampling, such as Tomek Links, with the goal of improving the performance of the classifiers generated by different ML algorithms (decision trees, support vector machines, among others) applied to problem of determining the structural similarity of proteins.