XML-based data preparation for robust deep parsing

  • Authors:
  • Claire Grover;Alex Lascarides

  • Affiliations:
  • The University of Edinburgh, Edinburgh, UK;The University of Edinburgh, Edinburgh, UK

  • Venue:
  • ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe the use of XML tokenisation, tagging and mark-up tools to prepare a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar. Hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approach to pre-processing, helping to ameliorate the 'messiness' in real language data and improve parse performance.