Relabeling algorithm for retrieval of noisy instances and improving prediction quality

Authors:
Shital Shah;Andrew Kusiak
Affiliations:
Health Systems Management, Rush University Medical Center, 1700 W. Van Buren St, 126B, Chicago, Illinois 60612, USA;Intelligent Systems Laboratory, The University of Iowa, 2139 Seamans Center, Iowa City, IA 52242-1527, USA
Venue:
Computers in Biology and Medicine
Year:
2010

Citing 17
Cited 0

Instance-Based Learning Algorithms

Machine Learning
An efficient algorithm for optimal pruning of decision trees

Artificial Intelligence
Pruning Algorithms for Rule Learning

Machine Learning
Separate-and-Conquer Rule Learning

Artificial Intelligence Review
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence

Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Prediction algorithms and confidence measures based on algorithmic randomness theory

Theoretical Computer Science - Natural computing
An Empirical Comparison of Pruning Methods for Decision Tree Induction

Machine Learning
Instance Pruning Techniques

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper

Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference
Improving Classification by Removing or Relabeling Mislabeled Instances

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Identifying and Handling Mislabelled Instances

Journal of Intelligent Information Systems
Data Mining

Data Mining
A penalized likelihood based pattern classification algorithm

Pattern Recognition
Patient-recognition data-mining model for BCG-plus interferon immunotherapy bladder cancer treatment

Computers in Biology and Medicine
Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
SEPARATE: a machine learning method based on semi-global partitions

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

A relabeling algorithm for retrieval of noisy instances with binary outcomes is presented. The relabeling algorithm iteratively retrieves, selects, and re-labels data instances (i.e., transforms a decision space) to improve prediction quality. It emphasizes knowledge generalization and confidence rather than classification accuracy. A confidence index incorporating classification accuracy, prediction error, impurities in the relabeled dataset, and cluster purities was designed. The proposed approach is illustrated with a binary outcome dataset and was successfully tested on the standard benchmark four UCI repository dataset as well as bladder cancer immunotherapy data. A subset of the most stable instances (i.e., 7% to 51% of the sample) with high confidence (i.e., between 64%-99.44%) was identified for each application along with most noisy instances. The domain experts and the extracted knowledge validated the relabeled instances and corresponding confidence indexes. The relabeling algorithm with some modifications can be applied to other medical, industrial, and service domains.