An empirical analysis of under-sampling techniques to balance a protein structural class dataset

  • Authors:
  • Marcilio C. P. de Souto;Valnaide G. Bittencourt;Jose A. F. Costa

  • Affiliations:
  • Department of Informatics and Applied Mathematics, Federal University of Rio Grande do Norte, Natal-RN, Brazil;Department of Computing and Automation, Federal University of Rio Grande do Norte, Natal-RN, Brazil;Department of Electric Engineering, Federal University of Rio Grande do Norte, Natal-RN, Brazil

  • Venue:
  • ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

There have been a great deal of research on learning from imbalanced datasets. Among the widely used methods proposed to solve such a problem, the most common are based either on under or over sampling of the original dataset. In this work, we evaluate several methods of under-sampling, such as Tomek Links, with the goal of improving the performance of the classifiers generated by different ML algorithms (decision trees, support vector machines, among others) applied to problem of determining the structural similarity of proteins.