Annotating protein function through lexical analysis

Authors:
Rajesh Nair;Burkhard Rost
Affiliations:
Department of Physics, Columbia University;Department of Bio-Chemistry and Molecular Biopysics at Columbia University
Venue:
AI Magazine
Year:
2004

Citing 16
Cited 0

Automatic text processing

Automatic text processing
Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Natural language understanding (2nd ed.)

Natural language understanding (2nd ed.)
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Using linear algebra for intelligent information retrieval

SIAM Review
Information extraction

Communications of the ACM
Improving software pipelining with hardware support for self-spatial loads

ACM SIGARCH Computer Architecture News - Special issue on Interact-3 workshop
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
The Frame-Based Module of the SUISEKI Information Extraction System

IEEE Intelligent Systems
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Genes, Themes, and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Automatic Construction of Knowledge Base from Biological Papers

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
Protein Sequence Annotation in the Genome Era: The Annotation Concept of SWISS-PROT + TREMBL

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
Predicting Enzyme Function from Sequence: A Systematic Appraisal

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
Automatic Annotation for Biological Sequences by Etraction of Keywords from MEDLINE Abstracts: Development of a Prototype System

Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We now know the full genomes of more than 60 organisms. The experimental characterization of the newly sequenced proteins is deemed to lack behind this explosion of naked sequences (sequencefunction gap). The rate at which expert annotators add the experimental information into more or less controlled vocabularies of databases snails along at an even slower pace. Most methods that annotate protein function exploit sequence similarity by transferring experimental information for homologues. A crucial development aiding such transfer is large-scale, work- and management-intensive projects aimed at developing a comprehensive ontology for gene-protein function, such as the Gene Ontology project. In parallel, fully automatic or semiautomatic methods have successfully begun to mine the existing data through lexical analysis. Some of these tools target parsing controlled vocabulary from databases; others venture at mining free texts from MEDLINE abstracts or full scientific papers. Automated text analysis has become a rapidly expanding discipline in bioinformatics. A few of these tools have already been embedded in research projects.