Semantics-aware open information extraction in the biomedical domain

  • Authors:
  • Victoria Nebot;Rafael Berlanga

  • Affiliations:
  • Lenguajes y Sistemas Informáticos, Castellón, Spain;Lenguajes y Sistemas Informáticos, Castellón, Spain

  • Venue:
  • Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The increasing amount of biomedical scientific literature published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on recognizing well-defined entities such as genes or proteins, which constitutes the basis for extracting the relations between the recognized entities. Most of the work has focused on harvesting domain-specific, pre-specified relations, which usually requires manual labor and heavy machinery. The intrinsic features and scale of the Web demand new approaches able to cope with the diversity of documents, where the number of relations is unbounded and not known in advance. This paper presents a scalable method for the extraction of biomedical relations from text. The method is not geared to any specific sub-domain (e.g. protein-protein interactions, drug-drug interactions, etc.) and does not require any manual input or deep processing. Even better, the method uses the extracted relations to infer a set of abstract semantic relations and their signature types, which constitutes a valuable source of knowledge when constructing formal knowledge bases. We enable seamless integration of the extracted relations with the available biomedical resources through the process of semantic annotation. The proposed approach has successfully been applied to the CALBC corpus (i.e. almost a million text documents) and UMLS has been used as knowledge resource for semantic annotation.