Semantic concept-enriched dependence model for medical information retrieval

  • Authors:
  • Sungbin Choi;Jinwook Choi;Sooyoung Yoo;Heechun Kim;Youngho Lee

  • Affiliations:
  • Department of Biomedical Engineering, Seoul National University, Seoul, Republic of Korea;Department of Biomedical Engineering, Seoul National University, Seoul, Republic of Korea;Center for Medical Informatics, Seoul National University Bundang Hospital, Gyeonggi-do, Republic of Korea;Department of Biomedical Engineering, Seoul National University, Seoul, Republic of Korea;Department of Information Technology, Gachon University, Incheon, Republic of Korea

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Objective: In medical information retrieval research, semantic resources have been mostly used by expanding the original query terms or estimating the concept importance weight. However, implicit term-dependency information contained in semantic concept terms has been overlooked or at least underused in most previous studies. In this study, we incorporate a semantic concept-based term-dependence feature into a formal retrieval model to improve its ranking performance. Design: Standardized medical concept terms used by medical professionals were assumed to have implicit dependency within the same concept. We hypothesized that, by elaborately revising the ranking algorithms to favor documents that preserve those implicit dependencies, the ranking performance could be improved. The implicit dependence features are harvested from the original query using MetaMap. These semantic concept-based dependence features were incorporated into a semantic concept-enriched dependence model (SCDM). We designed four different variants of the model, with each variant having distinct characteristics in the feature formulation method. Measurements: We performed leave-one-out cross validations on both a clinical document corpus (TREC Medical records track) and a medical literature corpus (OHSUMED), which are representative test collections in medical information retrieval research. Results: Our semantic concept-enriched dependence model consistently outperformed other state-of-the-art retrieval methods. Analysis shows that the performance gain has occurred independently of the concept's explicit importance in the query. Conclusion: By capturing implicit knowledge with regard to the query term relationships and incorporating them into a ranking model, we could build a more robust and effective retrieval model, independent of the concept importance.