High-recall extraction of acronym-definition pairs with relevance feedback

  • Authors:
  • Anna Yarygina;Natalia Vassilieva

  • Affiliations:
  • Saint-Petersburg University;HP Labs

  • Venue:
  • Proceedings of the 2012 Joint EDBT/ICDT Workshops
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the problem of extracting acronyms and their definitions from large documents in a setting, when high recall is required and user feedback is available. We propose a three step approach to deal with the problem. First, acronym candidates are extracted using a weak regular expression. This step results in a list of acronyms with high recall but low precision rates. Second, definitions are constructed for every acronym candidate from its surrounding text. And last, a classifier is used to select genuine acronym-definition pairs. At the last step we use relevance feedback mechanism to tune the classifier model for every particular document. This allows achieving reasonable precision without losing recall. As opposed to existing approaches, either created to be generic and domain independent or tuned to one particular domain, our method is adaptive to an input document. We evaluate the proposed approach using three datasets from different domains. The experiments prove the validity of the presented ideas.