SRF: a framework for the study of classifier behavior under training set mislabeling noise

Authors:
Katsiaryna Mirylenka;George Giannakopoulos;Themis Palpanas
Affiliations:
University of Trento, Italy;Institute of Informatics and Telecommunications of NCSR Demokritos, Greece;University of Trento, Italy
Venue:
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Year:
2012

Citing 15
Cited 1

Learning time-varying concepts

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Estimating the Intrinsic Dimension of Data with a Fractal-Based Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving Medical/Biological Data Classification Performance by Wavelet Preprocessing

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Introduction to the Special Issue on Meta-Learning

Machine Learning
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Improvements to Platt's SMO Algorithm for SVM Classifier Design

Neural Computation
Evaluating the intrinsic dimension of evolving data streams

Proceedings of the 2006 ACM symposium on Applied computing
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
Top 10 algorithms in data mining

Knowledge and Information Systems
The lack of a priori distinctions between learning algorithms

Neural Computation
The existence of a priori distinctions between learning algorithms

Neural Computation
On learning algorithm selection for classification

Applied Soft Computing
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
The Effect of History on Modeling Systems' Performance: The Problem of the Demanding Lord

ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining

dbTrento: the data and information management group at the University of Trento

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine learning algorithms perform differently in settings with varying levels of training set mislabeling noise. Therefore, the choice of a good algorithm for a particular learning problem is crucial. In this paper, we introduce the "Sigmoid Rule" Framework focusing on the description of classifier behavior in noisy settings. The framework uses an existing model of the expected performance of learning algorithms as a sigmoid function of the signal-to-noise ratio in the training instances. We study the parameters of the above sigmoid function using five different classifiers, namely, Naive Bayes, kNN, SVM, a decision tree classifier, and a rule-based classifier. Our study leads to the definition of intuitive criteria based on the sigmoid parameters that can be used to compare the behavior of learning algorithms in the presence of varying levels of noise. Furthermore, we show that there exists a connection between these parameters and the characteristics of the underlying dataset, hinting at how the inherent properties of a dataset affect learning. The framework is applicable to concept drift scenaria, including modeling user behavior over time, and mining of noisy data series, as in sensor networks.