A novel system for predicting plant protein kinase superfamily by using machine learning methodology

  • Authors:
  • V. Mallika;K. C. Sivakumar;E. V. Soniya

  • Affiliations:
  • Rajiv Gandhi Centre for Biotechnology, Thiruvananthapuram, Kerala, India;Rajiv Gandhi Centre for Biotechnology, Thiruvananthapuram, Kerala, India;Rajiv Gandhi Centre for Biotechnology, Thiruvananthapuram, Kerala, India

  • Venue:
  • ISB '10 Proceedings of the International Symposium on Biocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Protein kinases, one of the largest superfamily of proteins which involved in almost every cellular processes. In plants, due to their important roles in cellular communication, growth and development more researches are going on in this particular protein. Developing a tool to identify the probability of the sequence being a plant protein kinase will simplify the efforts and accelerate the experimental characterization. In this approach, a high performance prediction server 'PhytokinaseSVM' has been developed and implemented which is available at http://type3pks.in/kinase. Support vector machine, a kernel based supervised learning technology and compositional properties including dipeptide and multiplet frequency were used in the developmental procedure. Based on the limited available data, the tool provides a simple unique platform to identify the probability of a particular sequence, being a plant protein kinase or not with moderately high accuracy (98%). PhytokinaseSVM achieved 96% specificity and 100% sensitivity when tested with 500 protein kinases and 500 non-protein kinases that were not the part of the training dataset. We expect that this tool may serve as a useful resource for plant protein kinase researchers as it is freely available. The tool also allows the prediction of other eukaryotic protein kinases. Currently work is being progressed for further betterment of prediction accuracy by including more sequence features in the training dataset.