Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Feature generation and representations for protein-protein interaction classification
Journal of Biomedical Informatics
Hi-index | 0.00 |
The tagging of biological entities, and in particular gene and protein names, is an essential step in the analysis of textual information in Molecular Biology and Biomedicine. The problem is harder than was originally thought because of the highly dynamic nature of the research area, in which new genes and their functions are constantly being discovered, and because of the lack of commonly accepted standards. An impressive collection of techniques has been used to detect protein and gene names in the last four-five years, ranging from typical NLP to purely bioinformatics approaches. We explore here the relationship between protein/gene names and expressions used to characterize protein/gene function. These expressions are captured in a collection of patterns derived from an original set of manually derived expressions, extended to cover lexical variants and filtered with known cases of association patterns/names. Applying these patterns to a large collection of curated sentences, we found a significant number of patterns with a very strong tendency to appear only in sentences in which a protein/gene name is simultaneously present. This approach is part of a larger effort to incorporate contextual information so as to make biological information less ambiguous.