Information extraction from biomedical text

  • Authors:
  • Jerry R. Hobbs

  • Affiliations:
  • USC Information Sciences Institute, Marina del Rey, CA

  • Venue:
  • Journal of Biomedical Informatics - Special issue: Sublanguage
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Information extraction is the process of scanning text for information relevant to some interest, including extracting entities, relations, and events. It requires deeper analysis than key word searches, but its aims fall short of the very hard and long-term problem of full text understanding. Information extraction represents a midpoint on this spectrum, where the aim is to capture structured information without sacrificing feasibility. One of the key ideas in this technology is to separate processing into several stages, in cascaded finite-state transducers. The earlier stages recognize smaller linguistic objects and work in a largely domain-independent fashion. The later stages take these linguistic objects as input and find domain-dependent patterns among them. There are now initial efforts to apply this technology to biomedical text. In other domains, the technology plateaued at about 60% recall and precision. Even if applications to biomedical text do no better than this, they could still prove to be of immense help to curatorial activities.