Machine Learning
Short Communication: Protein ligand interaction database (PLID)
Computational Biology and Chemistry
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
Machine learning tools are employed to establish relationship between the characteristics of protein-ligand binding site and enzyme class. Enzyme classification is a challenging problem from data mining perspective due to (i) class imbalance problem and (ii) appropriate feature selection. We address the problem by choosing novel features from protein binding site. Protein Ligand Interaction Database (PLID), which gives a comprehensive view of binding sites in a protein along with other contact information, is updated and presented here as PLID v1.1 . The database facilitates the study of protein-ligand interaction. Novel features due to protein ligand interaction including the chemical compound features as well as fraction of contact and tightness are investigated for classification task. The weighted classification accuracy for the data set with binding site residues as features is found to be 56% using a Random Forest classifier. It may be concluded that either the binding site features are not adequately representing the enzyme class information or the problem is caused due to the class imbalance. This problem needs further investigation.