GeneTUC, GENIA and google: natural language understanding in molecular biology literature

Authors:
Rune Sætre;Harald Søvik;Tore Amble;Yoshimasa Tsuruoka
Affiliations:
Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway;Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway;Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway;Department of Computer Science, University of Tokyo, Bunkyo-ku, Tokyo, Japan
Venue:
Transactions on Computational Systems Biology V
Year:
2006

Citing 14
Cited 0

Information extraction

Communications of the ACM
Natural Language Processing for PROLOG Programmers

Natural Language Processing for PROLOG Programmers
Building deep dependency structures with a wide-coverage CCG parser

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Discovering patterns to extract protein--protein interactions from full texts

Bioinformatics
An architecture for biological information extraction and representation

Bioinformatics
ProtChew: Automatic Extraction of Protein Names from Biomedical Literature

ICDEW '05 Proceedings of the 21st International Conference on Data Engineering Workshops
Large-scale induction and evaluation of lexical resources from the Penn-II treebank

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Deep linguistic analysis for the accurate identification of predicate-argument relations

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Evaluating and integrating treebank parsers on a biomedical corpus

Software '05 Proceedings of the Workshop on Software
Comparative experiments on learning information extractors for proteins and their interactions

Artificial Intelligence in Medicine
gProt: annotating protein interactions using google and gene ontology

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Parsing biomedical literature

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Adapting a probabilistic disambiguation model of an HPSG parser to a new domain

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Semantic annotation of biomedical literature using google

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the increasing amount of biomedical literature, there is a need for automatic extraction of information to support biomedical researchers. GeneTUC has been developed to be able to read biological texts and answer questions about them afterwards. The knowledge base of the system is constructed by parsing MEDLINE abstracts or other online text strings retrieved by the Google API. When the system encounters words that are not in the dictionary, the Google API can be used to automatically determine the semantic class of the word and add it to the dictionary. The performance of the GeneTUC parser was tested and compared to the manually tagged GENIA corpus with EvalB, giving bracketing precision and recall scores of 70,6% and 53,9% respectively. GeneTUC was able to parse 60,2% of the sentences, and the POS-tagging accuracy was 86.0%. This is not as high as the best taggers and parsers available, but GeneTUC is also capable of doing deep reasoning, like anaphora resolution and question answering, which is not a part of the state-of-the-art parsers.