Using dependency parsing and probabilistic inference to extract relationships between genes, proteins and malignancies implicit among multiple biomedical research abstracts

  • Authors:
  • Ben Goertzel;Hugo Pinto;Ari Heljakka;Izabela Freire Goertzel;Mike Ross;Cassio Pennachin

  • Affiliations:
  • Virginia Tech, Arlington, VA;Novamente LLC, Rockville, MD;Novamente LLC, Rockville, MD;Novamente LLC, Rockville, MD;SAIC, Kingstowne, VA;Novamente LLC, Rockville, MD

  • Venue:
  • LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe BioLiterate, a prototype software system which infers relationships involving relationships between genes, proteins and malignancies from research abstracts, and has initially been tested in the domain of the molecular genetics of oncology. The architecture uses a natural language processing module to extract entities, dependencies and simple semantic relationships from texts, and then feeds these features into a probabilistic reasoning module which combines the semantic relationships extracted by the NLP module to form new semantic relationships. One application of this system is the discovery of relationships that are not contained in any individual abstract but are implicit in the combined knowledge contained in two or more abstracts.