Exploring variations across biomedical subdomains

  • Authors:
  • Tom Lippincott;Diarmuid Ó. Séaghdha;Lin Sun;Anna Korhonen

  • Affiliations:
  • University of Cambridge;University of Cambridge;University of Cambridge;University of Cambridge

  • Venue:
  • COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Previous research has demonstrated the importance of handling differences between domains such as "newswire" and "biomedicine" when porting NLP systems from one domain to another. In this paper we identify the related issue of subdomain variation, i.e., differences between subsets of a domain that might be expected to behave homogeneously. Using a large corpus of research articles, we explore how subdomains of biomedicine vary across a variety of linguistic dimensions and discover that there is rich variation. We conclude that an awareness of such variation is necessary when deploying NLP systems for use in single or multiple subdomains.