Using dependency parsing and probabilistic inference to extract relationships between genes, proteins and malignancies implicit among multiple biomedical research abstracts

  • Authors:
  • Ben Goertzel;Izabela Freire Goertzel;Hugo Pinto;Mike Ross;Ari Heljakka;Cassio Pennachin

  • Affiliations:
  • Virginia Tech, Arlington, VA;Novamente LLC, Rockville, MD;Novamente LLC, Rockville, MD;SAIC, Kingstowne, VA;Novamente LLC, Rockville, MD;Novamente LLC, Rockville, MD

  • Venue:
  • BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe BioLiterate, a prototype software system which infers relationships involving relationships between genes, proteins and malignancies from research abstracts, and has initially been tested in the domain of the molecular genetics of oncology. The architecture uses a natural language processing module to extract entities, dependencies and simple semantic relationships from texts, and then feeds these features into a probabilistic reasoning module which combines the semantic relationships extracted by the NLP module to form new semantic relationships. One application of this system is the discovery of relationships that are not contained in any individual abstract but are implicit in the combined knowledge contained in two or more abstracts.