HiSP: a probabilistic data mining technique for protein classification

Authors:
Luiz Merschmann;Alexandre Plastino
Affiliations:
Departamento de Ciência da Computação, Universidade Federal Fluminense, Niterói, Brazil;Departamento de Ciência da Computação, Universidade Federal Fluminense, Niterói, Brazil
Venue:
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Year:
2006

Citing 5
Cited 0

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Data mining: concepts and techniques

Data mining: concepts and techniques
Data Mining with Microsoft SQL Server 2000 Technical Reference

Data Mining with Microsoft SQL Server 2000 Technical Reference
Automated data-driven discovery of motif-based protein function classifiers

Information Sciences: an International Journal
A Bayesian approach for protein classification

Proceedings of the 2006 ACM symposium on Applied computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we propose a new computational technique to solve the protein classification problem. The goal is to predict the functional family of novel protein sequences based on their motif composition. In order to improve the results obtained with other known approaches, we propose a new data mining technique for protein classification based on Bayes' theorem, called Highest Subset Probability (HiSP). To evaluate our proposal, datasets extracted from Prosite, a curated protein family database, are used as experimental datasets. The computational results have shown that the proposed method outperforms other known methods for all tested datasets and looks very promising for problems with characteristics similar to the problem addressed here.