Context sensitive vocabulary and its application in protein secondary structure prediction

Authors:
Yan Liu;Jaime Carbonell;Judith Klein-Seetharaman;Vanathi Gopalakrishnan
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA and University of Pittsburgh, Pittsburgh, PA;University of Pittsburgh, Pittsburgh, PA
Venue:
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2004

Citing 2
Cited 2

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Comparative n-gram analysis of whole-genome protein sequences

HLT '02 Proceedings of the second international conference on Human Language Technology Research

Collaborative discovery through biological language modeling interface

Ambient Intelligence in Everyday Life
A compact hybrid feature vector for an accurate secondary structure prediction

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Protein secondary structure prediction is an important step towards understanding the relation between protein sequence and structure. However, most current prediction methods use features difficult for biologists to interpret. In this paper, we present a new method that applies information retrieval techniques to solve the problem:we extract a context sensitive biological vocabulary for protein sequences and apply text classification methods to predict protein secondary structure. Experimental results show that our method performs comparably to the state-of-art methods. Furthermore, the context sensitive vocabularies can serve as a useful tool to discover meaningful regular expression patterns for protein structures.