New techniques for extracting features from protein sequences

Authors:
J. T. L. Wang;Q. Ma;D. Shasha;C. H. Wu
Affiliations:
Department of Computer and Information Science, New Jersey Institute of Technology, University Heights, Newark, New Jersey;Novartis Pharmaceuticals Corporation, Summit, New Jersey;Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, New York;National Biomedical Research Foundation, Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC
Venue:
IBM Systems Journal - Deep computing for the life sciences
Year:
2001

Citing 11
Cited 16

Fast text searching: allowing errors

Communications of the ACM
Combinatorial pattern discovery for scientific data: some preliminary results

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition

Machine Learning - Special issue on applications in molecular biology
Classifying proteins by family using the product of correlated p-values

RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Systematic and automated discovery of patterns in PROSITE families

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Application of neural networks to biological data mining: a case study in protein sequence classification

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications

Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications
Neural Networks and Genome Informatics

Neural Networks and Genome Informatics
Machine Learning Approaches to Gene Recognition

IEEE Expert: Intelligent Systems and Their Applications
Discovering Patterns and Subfamilies in Biosequences

Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Color Set Size Problem with Application to String Matching

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching

GeneScout: a data mining system for predicting vertebrate genes in genomic DNA sequences

Information Sciences: an International Journal - Special issue: Soft computing data mining
BIO-AJAX: an extensible framework for biological data cleaning

ACM SIGMOD Record
New voting strategies designed for the classification of nucleic sequences

Knowledge and Information Systems
Markov Encoding for Detecting Signals in Genomic Sequences

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Discovering frequent geometric subgraphs

Information Systems
A platform based on the multi-dimensional data modal for analysis of bio-molecular structures

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Subsequence-based feature map for protein function classification

Computational Biology and Chemistry
An overview of protein-folding techniques: issues and perspectives

International Journal of Bioinformatics Research and Applications
Kernel design for RNA classification using Support Vector Machines

International Journal of Data Mining and Bioinformatics
An efficient technique for superfamily classification of amino acid sequences: feature extraction, fuzzy clustering and prototype selection

Fuzzy Sets and Systems
Protein sequence classification using probabilistic motifs and neural networks

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
Generalised Sequence Signatures through symbolic clustering

International Journal of Data Mining and Bioinformatics
Dimensional reduction in the protein secondary structure prediction: non-linear method improvements

International Journal of Computational Intelligence in Bioinformatics and Systems Biology
An EM-Approach for clustering multi-instance objects

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Feature subset selection for protein subcellular localization prediction

ICIC'06 Proceedings of the 2006 international conference on Computational Intelligence and Bioinformatics - Volume Part III
Fast protein superfamily classification using principal component null space analysis

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose new techniques to extract features from protein sequences. We then use the features as inputs for a Bayesian neural network (BNN) and apply the BNN to classifying protein sequences obtained from the PIR (Protein Information Resource) database maintained at the National Biomedical Research Foundation. To evaluate the performance of the proposed approach, we compare it with other protein classifiers built based on sequence alignment and machine learning methods. Experimental results show the high precision of the proposed classifier and the complementarity of the bioinformatics tools studied in the paper.