Syntactic-Semantic frames for clinical cohort identification queries

  • Authors:
  • Dina Demner-Fushman;Swapna Abhyankar

  • Affiliations:
  • National Library of Medicine, Bethesda, MD;National Library of Medicine, Bethesda, MD

  • Venue:
  • DILS'12 Proceedings of the 8th international conference on Data Integration in the Life Sciences
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large sets of electronic health record data are increasingly used in retrospective clinical studies and comparative effectiveness research. The desired patient cohort characteristics for such studies are best expressed as free text descriptions. We present a syntactic-semantic approach to structuring these descriptions. We developed the approach on 60 training topics (descriptions) and evaluated it on 35 test topics provided within the 2011 TREC Medical Record evaluation. We evaluated the accuracy of the frames as well as the modifications needed to achieve near perfect precision in identifying the top 10 eligible patients. Our automatic approach accurately captured 34 test descriptions; 25 automatic frames needed no modifications for finding eligible patients. Further evaluations of the overall average retrieval effectiveness showed that frames are not needed for simple descriptions containing one or two key terms. However, our training results suggest that the frames are needed for more complex real-life cohort selection tasks.