Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Theoretical Comparison between the Gini Index and Information Gain Criteria
Annals of Mathematics and Artificial Intelligence
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Collective multi-label classification
Proceedings of the 14th ACM international conference on Information and knowledge management
Mining semantically related terms from biomedical literature
ACM Transactions on Asian Language Information Processing (TALIP)
An Adaptive Fuzzy kNN Text Classifier Based on Gini Index Weight
ISCC '06 Proceedings of the 11th IEEE Symposium on Computers and Communications
Hierarchical multi-label prediction of gene function
Bioinformatics
Kernel-Based Learning of Hierarchical Multilabel Classification Models
The Journal of Machine Learning Research
Multi-class Protein Classification Using Adaptive Codes
The Journal of Machine Learning Research
Decision trees for hierarchical multi-label classification
Machine Learning
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Feature selection with a measure of deviations from Poisson in text categorization
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
This paper presents a two-phase prediction model for proteins and protein functions from biological literature based on Gini Index algorithm. As the volume and diversity of biological resources grows, computational protein function prediction become much more important. In this paper, we considered automatic annotation of the Gene Ontology (GO) by computational function prediction approaches entailing feature selection method based on Gini Index and protein function prediction model. Gini-Index has been used as a split measure for choosing the most appropriate splitting attribute in decision tree. Recently, the Gini-Index algorithm for feature selection in text categorization was introduced and proved to be good performances. Thus, we present a novel model to predict both multi-label proteins from PubMed literatures and their functions from protein-function of GO Annotation. First, we introduce a feature selection algorithm with Gini-Index expressions to predict proteins from PubMed and obtain proteintext subsets. Second, we propose a novel two-phase prediction method for proteins and their protein functions with those subsets. As experimental results, we evaluated the results of prediction for the proteins and their functions using the proposed methods. We have good performances notably overall for both of prediction of proteins and protein function from the biological literatures.