Fast parallel and serial approximate string matching
Journal of Algorithms
C4.5: programs for machine learning
C4.5: programs for machine learning
Combinatorial pattern discovery for scientific data: some preliminary results
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The nature of statistical learning theory
The nature of statistical learning theory
Mining features for sequence classification
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Growing decision trees on support-less association rules
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
A Tale of Two Classifiers: SNoW vs. SVM in Visual Recognition
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Evaluation of Techniques for Classifying Biological Sequences
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification
IEEE Transactions on Knowledge and Data Engineering
Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Software Defect Association Mining and Defect Correction Effort Prediction
IEEE Transactions on Software Engineering
Mining minimal distinguishing subsequence patterns with gap constraints
Knowledge and Information Systems
Defect Data Analysis Based on Extended Association Rule Mining
MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Frequent Closed Sequence Mining without Candidate Maintenance
IEEE Transactions on Knowledge and Data Engineering
g-MARS: Protein Classification Using Gapped Markov Chains and Support Vector Machines
PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
CONTOUR: an efficient algorithm for discovering discriminating subsequences
Data Mining and Knowledge Discovery
Condensed Representation of Sequential Patterns According to Frequency-Based Measures
IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Granular support vector machines with association rules mining for protein homology prediction
Artificial Intelligence in Medicine
Classifying proteins using gapped Markov feature pairs
Neurocomputing
A brief survey on sequence classification
ACM SIGKDD Explorations Newsletter
Efficient incremental mining of frequent sequence generators
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Early prediction of temporal sequences based on information transfer
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Frequent subsequence-based protein localization
BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Subcellular Localization Prediction through Boosting Association Rules
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient Mining of Gap-Constrained Subsequences and Its Various Applications
ACM Transactions on Knowledge Discovery from Data (TKDD)
An incremental hypersphere learning framework for protein membership prediction
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Two scalable algorithms for associative text classification
Information Processing and Management: an International Journal
CSSF-trie structure to mine constraint sequential patterns from progressive database
International Journal of Knowledge Engineering and Data Mining
Hi-index | 0.00 |
A number of medically important disease-causing bacteria (collectively called Gram-negative bacteria) are noted for the extra "outer" membrane that surrounds their cell. Proteins resident in this membrane (outer membrane proteins, or OMPs) are of primary research interest for antibiotic and vaccine drug design as they are on the surface of the bacteria and so are the most accessible targets to develop new drugs against. With the development of genome sequencing technology and bioinformatics, biologists can now deduce all the proteins that are likely produced in a given bacteria and have attempted to classify where proteins are located in a bacterial cell. However such protein localization programs are currently least accurate when predicting OMPs, and so there is a current need for the development of a better OMP classifier. Data mining research suggests that the use of frequent patterns has good performance in aiding the development of accurate and efficient classification algorithms. In this paper, we present two methods to identify OMPs based on frequent subsequences and test them on all Gram-negative bacterial proteins whose localizations have been determined by biological experiments. One classifier follows an association rule approach, while the other is based on support vector machines (SVMs). We compare the proposed methods with the state-of-the-art methods in the biological domain. The results demonstrate that our methods are better both in terms of accurately identifying OMPs and providing biological insights that increase our understanding of the structures and functions of these important proteins.