Two-phase prediction of protein functions from biological literature based on Gini-Index

  • Authors:
  • Heum Park;DaeWon Park;Hyuk-Chul Kwon

  • Affiliations:
  • Pusan National University, Jangjeon Geumjung Busan, Republic of Korea;Pusan National University, Jangjeon Geumjung Busan, Republic of Korea;Pusan National University, Jangjeon Geumjung Busan, Republic of Korea

  • Venue:
  • Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a two-phase prediction model for proteins and protein functions from biological literature based on Gini Index algorithm. As the volume and diversity of biological resources grows, computational protein function prediction become much more important. In this paper, we considered automatic annotation of the Gene Ontology (GO) by computational function prediction approaches entailing feature selection method based on Gini Index and protein function prediction model. Gini-Index has been used as a split measure for choosing the most appropriate splitting attribute in decision tree. Recently, the Gini-Index algorithm for feature selection in text categorization was introduced and proved to be good performances. Thus, we present a novel model to predict both multi-label proteins from PubMed literatures and their functions from protein-function of GO Annotation. First, we introduce a feature selection algorithm with Gini-Index expressions to predict proteins from PubMed and obtain proteintext subsets. Second, we propose a novel two-phase prediction method for proteins and their protein functions with those subsets. As experimental results, we evaluated the results of prediction for the proteins and their functions using the proposed methods. We have good performances notably overall for both of prediction of proteins and protein function from the biological literatures.