Learning to handle negated language in medical records search

  • Authors:
  • Nut Limsopatham;Craig Macdonald;Iadh Ounis

  • Affiliations:
  • University of Glasgow, Glasgow, United Kingdom;University of Glasgow, Glasgow, United Kingdom;University of Glasgow, Glasgow, United Kingdom

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Negated language is frequently used by medical practitioners to indicate that a patient does not have a given medical condition. Traditionally, information retrieval systems do not distinguish between the positive and negative contexts of terms when indexing documents. For example, when searching for patients with angina, a retrieval system might wrongly consider a patient with a medical record stating ``no evidence of angina" to be relevant. While it is possible to enhance a retrieval system by taking into account the context of terms within the indexing representation of a document, some non-relevant medical records can still be ranked highly, if they include some of the query terms with the intended context. In this paper, we propose a novel learning framework that effectively handles negated language. Based on features related to the positive and negative contexts of a term, the framework learns how to appropriately weight the occurrences of the opposite context of any query term, thus preventing documents that may not be relevant from being retrieved. We thoroughly evaluate our proposed framework using the TREC 2011 and 2012 Medical Records track test collections. Our results show significant improvements over existing strong baselines. In addition, in combination with a traditional query expansion and a conceptual representation approach, our proposed framework could achieve a retrieval effectiveness comparable to the performance of the best TREC 2011 and 2012 systems, while not addressing other challenges in medical records search, such as the exploitation of semantic relationships between medical terms.