Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach

Authors:
Fabio Rinaldi;Gerold Schneider;Kaarel Kaljurand;Michael Hess;Christos Andronis;Ourania Konstandi;Andreas Persidis
Affiliations:
Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zürich, Switzerland;Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zürich, Switzerland;Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zürich, Switzerland;Institute of Computational Linguistics, University of Zurich, Binzmühlestrasse 14, CH-8050 Zürich, Switzerland;Biovista, 34 Rodopoleos Str., Ellinikon, GR-16777 Athens, Greece;Biovista, 34 Rodopoleos Str., Ellinikon, GR-16777 Athens, Greece;Biovista, 34 Rodopoleos Str., Ellinikon, GR-16777 Athens, Greece
Venue:
Artificial Intelligence in Medicine
Year:
2007

Citing 9
Cited 11

Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Automatic rule induction for unknown-word guessing

Computational Linguistics
Applied morphological processing of English

Natural Language Engineering
A maximum entropy approach to identifying sentence boundaries

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Using grammatical relations to compare parsers

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Extracting human protein interactions from MEDLINE using a full-sentence parser

Bioinformatics
Corpus-Oriented grammar development for acquiring a head-driven phrase structure grammar from the penn treebank

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Relation mining over a corpus of scientific literature

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine

Guest editorial: Artificial Intelligence in Medicine AIME '05

Artificial Intelligence in Medicine
Methodological Review: Extracting interactions between proteins from the literature

Journal of Biomedical Informatics
Mining of Protein Subcellular Localizations based on a Syntactic Dependency Tree and WordNet

Proceedings of the 2008 conference on Knowledge-Based Software Engineering: Proceedings of the Eighth Joint Conference on Knowledge-Based Software Engineering
Tools for Text Mining over Biomedical Literature

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Syntactic dependency based heuristics for biological event extraction

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task
Using UMLS to construct a generalized hierarchical concept-based dictionary of brain functions for information extraction from the fMRI literature

Journal of Biomedical Informatics
Methodological Review: Text mining for traditional Chinese medical knowledge discovery: A survey

Journal of Biomedical Informatics
OntoGene in BioCreative II.5

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Comparing and combining chunkers of biomedical text

Journal of Biomedical Informatics
Relation mining experiments in the pharmacogenomics domain

Journal of Biomedical Informatics
High precision rule based PPI extraction and per-pair basis performance evaluation

Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: The amount of new discoveries (as published in the scientific literature) in the biomedical area is growing at an exponential rate. This growth makes it very difficult to filter the most relevant results, and thus the extraction of the core information becomes very expensive. Therefore, there is a growing interest in text processing approaches that can deliver selected information from scientific publications, which can limit the amount of human intervention normally needed to gather those results. Materials and methods: This paper presents and evaluates an approach aimed at automating the process of extracting functional relations (e.g. interactions between genes and proteins) from scientific literature in the biomedical domain. The approach, using a novel dependency-based parser, is based on a complete syntactic analysis of the corpus. Results: We have implemented a state-of-the-art text mining system for biomedical literature, based on a deep-linguistic, full-parsing approach. The results are validated on two different corpora: the manually annotated genomics information access (GENIA) corpus and the automatically annotated arabidopsis thaliana circadian rhythms (ATCR) corpus. Conclusion: We show how a deep-linguistic approach (contrary to common belief) can be used in a real world text mining application, offering high-precision relation extraction, while at the same time retaining a sufficient recall.