Granular support vector machines with association rules mining for protein homology prediction

Authors:
Yuchun Tang;Bo Jin;Yan-Qing Zhang
Affiliations:
Department of Computer Science, Georgia State University, P.O. Box 3994, Atlanta, GA 30302, USA;Department of Computer Science, Georgia State University, P.O. Box 3994, Atlanta, GA 30302, USA;Department of Computer Science, Georgia State University, P.O. Box 3994, Atlanta, GA 30302, USA
Venue:
Artificial Intelligence in Medicine
Year:
2005

Citing 7
Cited 11

An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Principles of data mining

Principles of data mining
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
On Modeling Data Mining with Granular Computing

COMPSAC '01 Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software Development
Classifying large data sets using SVMs with hierarchical clusters

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent-subsequence-based prediction of outer membrane proteins

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Granular neural networks for numerical-linguistic data fusion and knowledge discovery

IEEE Transactions on Neural Networks

A data mining approach to product assortment and shelf space allocation

Expert Systems with Applications: An International Journal
Integrated multilevel image fusion and match score fusion of visible and infrared face images for robust face recognition

Pattern Recognition
SVMs modeling for highly imbalanced classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Granular support vector machine based method for prediction of solubility of proteins on overexpression in escherichia coli

PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
A new SVM-based decision fusion method using multiple granular windows for protein secondary structure prediction

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
PET and CT images registration by means of soft computing and information fusion

BEBI'08 Proceedings of the 1st WSEAS international conference on Biomedical electronics and biomedical informatics
Query-adaptive ranking with support vector machines for protein homology prediction

ISBRA'11 Proceedings of the 7th international conference on Bioinformatics research and applications
Research of granular support vector machine

Artificial Intelligence Review
Granular support vector machine based on mixed measure

Neurocomputing
Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients

Applied Soft Computing
Review article: Computational intelligence techniques in bioinformatics

Computational Biology and Chemistry

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective:: Protein homology prediction between protein sequences is one of critical problems in computational biology. Such a complex classification problem is common in medical or biological information processing applications. How to build a model with superior generalization capability from training samples is an essential issue for mining knowledge to accurately predict/classify unseen new samples and to effectively support human experts to make correct decisions. Methodology:: A new learning model called granular support vector machines (GSVM) is proposed based on our previous work. GSVM systematically and formally combines the principles from statistical learning theory and granular computing theory and thus provides an interesting new mechanism to address complex classification problems. It works by building a sequence of information granules and then building support vector machines (SVM) in some of these information granules on demand. A good granulation method to find suitable granules is crucial for modeling a GSVM with good performance. In this paper, we also propose an association rules-based granulation method. For the granules induced by association rules with high enough confidence and significant support, we leave them as they are because of their high ''purity'' and significant effect on simplifying the classification task. For every other granule, a SVM is modeled to discriminate the corresponding data. In this way, a complex classification problem is divided into multiple smaller problems so that the learning task is simplified. Results and conclusions:: The proposed algorithm, here named GSVM-AR, is compared with SVM by KDDCUP04 protein homology prediction data. The experimental results show that finding the splitting hyperplane is not a trivial task (we should be careful to select the association rules to avoid overfitting) and GSVM-AR does show significant improvement compared to building one single SVM in the whole feature space. Another advantage is that the utility of GSVM-AR is very good because it is easy to be implemented. More importantly and more interestingly, GSVM provides a new mechanism to address complex classification problems.