Neural Information Processing
Improving Protein Localization Prediction Using Amino Acid Group Based Physichemical Encoding
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Sequence-based prediction of protein secretion success in Aspergillus niger
PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
Learning cellular sorting pathways using protein interactions and sequence motifs
RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
De Novo protein subcellular localization prediction by N-to-1 neural networks
CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
International Journal of Data Mining and Bioinformatics
Nuclear localization signal prediction based on sequential pattern mining
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Hi-index | 3.84 |
Motivation. The knowledge of the subcellular localization of a protein is fundamental for elucidating its function. It is difficult to determine the subcellular location for eukaryotic cells with experimental high-throughput procedures. Computational procedures are then needed for annotating the subcellular location of proteins in large scale genomic projects. Results. BaCelLo is a predictor for five classes of subcellular localization (secretory pathway, cytoplasm, nucleus, mitochondrion and chloroplast) and it is based on different SVMs organized in a decision tree. The system exploits the information derived from the residue sequence and from the evolutionary information contained in alignment profiles. It analyzes the whole sequence composition and the compositions of both the N-and C-termini. The training set is curated in order to avoid redundancy. For the first time a balancing procedure is introduced in order to mitigate the effect of biased training sets. Three kingdom-specific predictors are implemented: for animals, plants and fungi, respectively. When distributing the proteins from animals and fungi into four classes, accuracy of BaCelLo reach 74% and 76%, respectively; a score of 67% is obtained when proteins from plants are distributed into five classes. BaCelLo outperforms the other presently available methods for the same task and gives more balanced accuracy and coverage values for each class. We also predict the subcellular localization of five whole proteomes, Homo sapiens, Mus musculus, Caenorhabditis elegans, Saccharomyces cerevisiae and Arabidopsis thaliana, comparing the protein content in each different compartment. Availability. BaCelLo can be accessed at Contact.casadio@alma.unibo.it, andrea@biocomp.unibo.it, gigi@biocomp.unibo.it, piero@biocomp.unibo.it