Bootstrapping a Verb Lexicon for Biomedical Information Extraction

  • Authors:
  • Giulia Venturi;Simonetta Montemagni;Simone Marchi;Yutaka Sasaki;Paul Thompson;John Mcnaught;Sophia Ananiadou

  • Affiliations:
  • Istituto di Linguistica Computazionale, CNR, Pisa, Italy;Istituto di Linguistica Computazionale, CNR, Pisa, Italy;Istituto di Linguistica Computazionale, CNR, Pisa, Italy;School of Computer Science, University of Manchester, UK and National Centre for Text Mining, University of Manchester, UK;School of Computer Science, University of Manchester, UK and National Centre for Text Mining, University of Manchester, UK;School of Computer Science, University of Manchester, UK and National Centre for Text Mining, University of Manchester, UK;School of Computer Science, University of Manchester, UK and National Centre for Text Mining, University of Manchester, UK

  • Venue:
  • CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The extraction of information from texts requires resources that contain both syntactic and semantic properties of lexical units. As the use of language in specialized domains, such as biology, can be very different to the general domain, there is a need for domain-specific resources to ensure that the information extracted is as accurate as possible. We are building a large-scale lexical resource for the biology domain, providing information about predicate-argument structure that has been bootstrapped from a biomedical corpus on the subject of E. Coli. The lexicon is currently focussed on verbs, and includes both automatically-extracted syntactic subcategorization frames, as well as semantic event frames that are based on annotation by domain experts. In addition, the lexicon contains manually-added explicit links between semantic and syntactic slots in corresponding frames. To our knowledge, this lexicon currently represents a unique resource within in the biomedical domain.