A new co-training-style random forest for computer aided diagnosis

Authors:
Chao Deng;M. Zu Guo
Affiliations:
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China and China Mobile Research Institute, Beijing, China;School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
Venue:
Journal of Intelligent Information Systems
Year:
2011

Citing 25
Cited 2

Bagging predictors

Machine Learning
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Machine Learning

Machine Learning
Random Forests

Machine Learning
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning From Noisy Examples

Machine Learning
Analysis of new techniques to obtain quality training sets

Pattern Recognition Letters - Special issue: Sibgrapi 2001
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Exploiting unlabeled data in ensemble methods

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying and Handling Mislabelled Instances

Journal of Intelligent Information Systems
Democratic Co-Learning

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Bootstrapping coreference classifiers with multiple machine learning algorithms

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Learning Weighted Metrics to Minimize Nearest-Neighbor Classification Error

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning to classify e-mail

Information Sciences: an International Journal
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
Semi-Supervised Learning

Semi-Supervised Learning
Tri-training and data editing based semi-supervised clustering algorithm

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Semi-supervised multiple classifier systems: background and research directions

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
SETRED: self-training with editing

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Bagging, bumping, multiview, and active learning for record linkage with empirical results on patient identity data

Computer Methods and Programs in Biomedicine
Inter-training: Exploiting unlabeled data in multi-classifier systems

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine learning techniques used in computer aided diagnosis (CAD) systems learn a hypothesis to help the medical experts make a diagnosis in the future. To learn a well-performed hypothesis, a large amount of expert-diagnosed examples are required, which places a heavy burden on experts. By exploiting large amounts of undiagnosed examples and the power of ensemble learning, the co-training-style random forest (Co-Forest) releases the burden on the experts and produces well-performed hypotheses. However, the Co-forest may suffer from a problem common to other co-training-style algorithms, namely, that the unlabeled examples may instead be wrongly-labeled examples that become accumulated in the training process. This is due to the fact that the limited number of originally-labeled examples usually produces poor component classifiers, which lack diversity and accuracy. In this paper, a new Co-Forest algorithm named Co-Forest with Adaptive Data Editing (ADE-Co-Forest) is proposed. Not only does it exploit a specific data-editing technique in order to identify and discard possibly mislabeled examples throughout the co-labeling iterations, but it also employs an adaptive strategy in order to decide whether to trigger the editing operation according to different cases. The adaptive strategy combines five pre-conditional theorems, all of which ensure an iterative reduction of classification error and an increase in the scale of new training sets under PAC learning theory. Experiments on UCI datasets and an application to small pulmonary nodules detection using chest CT images show that ADE-Co-Forest can more effectively enhance the performance of a learned hypothesis than Co-Forest and DE-Co-Forest (Co-Forest with Data Editing but without adaptive strategy).