Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification

Authors:
Senqiang Zhou;Ke Wang
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2005

Citing 21
Cited 9

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Combinatorial pattern discovery for scientific data: some preliminary results

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The nature of statistical learning theory

The nature of statistical learning theory
Making large-scale support vector machine learning practical

Advances in kernel methods
Growing decision trees on support-less association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Enlarging the Margins in Perceptron Decision Trees

Machine Learning
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Using Programmatic Motifs and Genetic Programming to Classify Protein Sequences as to Cellular Location

EP '98 Proceedings of the 7th International Conference on Evolutionary Programming VII
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Heterogeneous Learner for Web Page Classification

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Frequent-subsequence-based prediction of outer membrane proteins

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable sequential pattern mining for biological sequences

Proceedings of the thirteenth ACM international conference on Information and knowledge management

Applying sequential rules to protein localization prediction

Computers & Mathematics with Applications
Kernel-based learning for biomedical relation extraction

Journal of the American Society for Information Science and Technology
g-MARS: Protein Classification Using Gapped Markov Chains and Support Vector Machines

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
An Automatic Video Text Detection, Localization and Extraction Approach

Advanced Internet Based Systems and Applications
Classifying proteins using gapped Markov feature pairs

Neurocomputing
The forecasting model based on modified SVRM and PSO penalizing Gaussian noise

Expert Systems with Applications: An International Journal
Detecting fake websites: the contribution of statistical learning theory

MIS Quarterly
Dynamic and collective analysis of membrane protein interaction network based on gene regulatory network model

Neurocomputing
Multi-appliance recognition system with hybrid SVM/GMM classifier in ubiquitous smart home

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the localization prediction of membrane proteins for two families of medically important disease-causing bacteria, called Gram-Negative and Gram-Positive bacteria. Each such bacterium has its cell surrounded by several layers of membranes. Identifying where proteins are located in a bacterial cell is of primary research interest for antibiotic and vaccine drug design. This problem has three requirements: First, with any subsequence of amino acid residues being potentially a dimension, it has an extremely high dimensionality, few being irrelevant. Second, the prediction of a target localization site must have a high precision in order to be useful to biologists, i.e., at least 90 percent or even 95 percent, while recall is as high as possible. Achieving such a precision is made harder by the fact that target sequences are often much fewer than background sequences. Third, the rationale of prediction should be understandable to biologists for taking actions. Meeting all these requirements presents a significant challenge in that a high dimensionality requires a complex model that is often hard to understand. The support vector machine (SVM) model has an outstanding performance in a high-dimensional space, therefore, it addresses the first two requirements. However, the SVM model involves many features in a single kernel function, therefore, it does not address the third requirement. We address all three requirements by integrating the SVM model with a rule-based model, where the understandable if-then rules capture "major structures驴 and the elaborated SVM model captures "subtle structures.驴 Importantly, the integrated model preserves the precision/recall performance of SVM and, at the same time, exposes major structures in a form understandable to the human user. We focus on searching for high quality rules and partitioning the prediction between rules and SVM so as to achieve these properties. We evaluate our method on several membrane localization problems. The purpose of this paper is not improving the precision/recall of SVM, but is manifesting the rationale of a SVM classifier through partitioning the classification between if-then rules and the SVM classifier and preserving the precision/recall of SVM.