Identifying interaction sentences from biological literature using automatically extracted patterns

Authors:
Haibin Liu;Christian Blouin;Vlado Kešelj
Affiliations:
Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada;Dalhousie University, Halifax, NS, Canada
Venue:
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Year:
2009

Citing 10
Cited 3

Modern elementary statistics

Modern elementary statistics
PAT-tree-based keyword extraction for Chinese information retrieval

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Journal of the ACM (JACM)
Periods, capitalized words, etc.

Computational Linguistics
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
PAT-trees with the deletion function as the learning device for linguistic patterns

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Discovering patterns to extract protein--protein interactions from full texts

Bioinformatics
BioIE: extracting informative sentences from the biomedical literature

Bioinformatics
Finding the evidence for protein-protein interactions from PubMed abstracts

Bioinformatics
Comparative experiments on learning information extractors for proteins and their interactions

Artificial Intelligence in Medicine

Finding optimal parameters for edit distance based sequence classification is NP-hard

Proceedings of the KDD-09 Workshop on Statistical and Relational Learning in Bioinformatics
Sentence identification of biological interactions using PATRICIA tree generated patterns and genetic algorithm optimized parameters

Data & Knowledge Engineering
Mining and modeling linkage information from citation context for improving biomedical literature retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important task in information retrieval is to identify sentences that contain important relationships between key concepts. In this work, we propose a novel approach to automatically extract sentence patterns that contain interactions involving concepts of molecular biology. A pattern is defined in this work as a sequence of specialized Part-of-Speech (POS) tags that capture the structure of key sentences in the scientific literature. Each candidate sentence for the classification task is encoded as a POS array and then aligned to a collection of pre-extracted patterns. The quality of the alignment is expressed as a pairwise alignment score. The most innovative component of this work is the use of a Genetic Algorithm (GA) to maximize the classification performance of the alignment scoring scheme. The system achieves an F-score of 0.834 in identifying sentences which describe interactions between biological entities. This performance is mostly affected by the quality of the preprocessing steps such as term identification and POS tagging.