Frequent-subsequence-based prediction of outer membrane proteins

Authors:
Rong She;Fei Chen;Ke Wang;Martin Ester;Jennifer L. Gardy;Fiona S. L. Brinkman
Affiliations:
Simon Fraser University;Simon Fraser University;Simon Fraser University;Simon Fraser University;Simon Fraser University;Simon Fraser University
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 10
Cited 20

Fast parallel and serial approximate string matching

Journal of Algorithms
C4.5: programs for machine learning

C4.5: programs for machine learning
Combinatorial pattern discovery for scientific data: some preliminary results

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The nature of statistical learning theory

The nature of statistical learning theory
Mining features for sequence classification

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Growing decision trees on support-less association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
A Tale of Two Classifiers: SNoW vs. SVM in Visual Recognition

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Evaluation of Techniques for Classifying Biological Sequences

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Color Set Size Problem with Application to String Matching

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching

Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification

IEEE Transactions on Knowledge and Data Engineering
Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Software Defect Association Mining and Defect Correction Effort Prediction

IEEE Transactions on Software Engineering
Mining minimal distinguishing subsequence patterns with gap constraints

Knowledge and Information Systems
Defect Data Analysis Based on Extended Association Rule Mining

MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Frequent Closed Sequence Mining without Candidate Maintenance

IEEE Transactions on Knowledge and Data Engineering
g-MARS: Protein Classification Using Gapped Markov Chains and Support Vector Machines

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
CONTOUR: an efficient algorithm for discovering discriminating subsequences

Data Mining and Knowledge Discovery
Condensed Representation of Sequential Patterns According to Frequency-Based Measures

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Granular support vector machines with association rules mining for protein homology prediction

Artificial Intelligence in Medicine
Classifying proteins using gapped Markov feature pairs

Neurocomputing
A brief survey on sequence classification

ACM SIGKDD Explorations Newsletter
Efficient incremental mining of frequent sequence generators

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Early prediction of temporal sequences based on information transfer

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Frequent subsequence-based protein localization

BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Subcellular Localization Prediction through Boosting Association Rules

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient Mining of Gap-Constrained Subsequences and Its Various Applications

ACM Transactions on Knowledge Discovery from Data (TKDD)
An incremental hypersphere learning framework for protein membership prediction

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Two scalable algorithms for associative text classification

Information Processing and Management: an International Journal
CSSF-trie structure to mine constraint sequential patterns from progressive database

International Journal of Knowledge Engineering and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A number of medically important disease-causing bacteria (collectively called Gram-negative bacteria) are noted for the extra "outer" membrane that surrounds their cell. Proteins resident in this membrane (outer membrane proteins, or OMPs) are of primary research interest for antibiotic and vaccine drug design as they are on the surface of the bacteria and so are the most accessible targets to develop new drugs against. With the development of genome sequencing technology and bioinformatics, biologists can now deduce all the proteins that are likely produced in a given bacteria and have attempted to classify where proteins are located in a bacterial cell. However such protein localization programs are currently least accurate when predicting OMPs, and so there is a current need for the development of a better OMP classifier. Data mining research suggests that the use of frequent patterns has good performance in aiding the development of accurate and efficient classification algorithms. In this paper, we present two methods to identify OMPs based on frequent subsequences and test them on all Gram-negative bacterial proteins whose localizations have been determined by biological experiments. One classifier follows an association rule approach, while the other is based on support vector machines (SVMs). We compare the proposed methods with the state-of-the-art methods in the biological domain. The results demonstrate that our methods are better both in terms of accurately identifying OMPs and providing biological insights that increase our understanding of the structures and functions of these important proteins.