A bootstrapping method for learning from heterogeneous data

Authors:
Ngo Phuong Nhung;Tu Minh Phuong
Affiliations:
Department of Computer Science, Posts & Telecommunications Institute of Technology, Hanoi, Vietnam;Department of Computer Science, Posts & Telecommunications Institute of Technology, Hanoi, Vietnam
Venue:
FGIT'11 Proceedings of the Third international conference on Future Generation Information Technology
Year:
2011

Citing 11
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Gene functional classification from heterogeneous data

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Identifying and Handling Mislabelled Instances

Journal of Intelligent Information Systems
Bootstrapping

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Tri-Training: Exploiting Unlabeled Data Using Three Classifiers

IEEE Transactions on Knowledge and Data Engineering
Fast protein classification with multiple networks

Bioinformatics
Hierarchical multi-label prediction of gene function

Bioinformatics
Spectral clustering and transductive learning with multiple views

Proceedings of the 24th international conference on Machine learning
Clustering with Multiple Graphs

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Fast integration of heterogeneous data sources for predicting gene function with limited annotation

Bioinformatics
CoTrade: Confident Co-Training With Data Editing

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In machine learning applications where multiple data sources present, it is desirable to effectively exploit the sources simultaneously to make better inferences. When each data source is presented as a graph, a common strategy is to combine the graphs, e.g. by taking the sum of their adjacency matrices, and then apply standard graph-based learning algorithms. In this paper, we take an alternative approach to this problem. Instead of performing the combination step, a graph-based learner is created on each graph and makes predictions independently. The method works in an iterative manner: labels predicted by some learners in each round are added to the labeled set and the models are retrained. By nature, the method is based on two popular semi-supervised learning approaches: bootstrapping and graph-based methods, to take their advantages. We evaluated the method on the gene function prediction problem with real biological datasets. Experiments show that our method significantly outperforms a standard graph-based algorithm and compares favorably with a state-of-the-art gene function prediction method.