Prediction of Enzyme Classification from Protein Sequence without the Use of Sequence Similarity
Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
Ensemble Methods in Machine Learning
MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Methods for Designing Multiple Classifier Systems
MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
A statistical framework for genomic data fusion
Bioinformatics
Protein function prediction with the shortest path in functional linkage graph and boosting
International Journal of Bioinformatics Research and Applications
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
Significant research efforts for robust integration of information from multiple sources are being pursued at a rapid pace. However, the information in heterogeneous sources is often incomplete and hence making the maximum use of all the available information is a challenging problem. Most of the recent research on data integration have been primarily focused on the cases where the information is available across all the different sources and can not effectively integrate sources in the presence of partial information. We develop an ensemble method that boosts the decisions made from different models on individual sources and obtain robust results for the task of class prediction. We propose a heterogeneous boosting framework that uses all the available information even if some of the sources do not provide any information about some objects. We demonstrate the effectiveness of the proposed framework for the problem of gene function prediction and compare to the state-of-the-art methods using several real-world biological datasets. We also show that the proposed method outperforms any kind of imputation schemes that are widely used while integrating data with partial information