Development and evaluation of a biomedical search engine using a predicate-based vector space model

  • Authors:
  • Myungjae Kwak;Gondy Leroy;Jesse D. Martinez;Jeffrey Harwell

  • Affiliations:
  • School of Information Technology, Middle Georgia State College, Macon, GA 31206, United States;School of Information Systems and Technology, Claremont Graduate University, Claremont, CA 91711, United States and Department of Management Information Systems, University of Arizona, Tucson, AZ ...;Cell Biology and Anatomy, Radiation Oncology, University of Arizona Cancer Center, Tucson, AZ 85719, United States;School of Information Systems and Technology, Claremont Graduate University, Claremont, CA 91711, United States

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2013
  • Smart Health and Wellbeing

    ACM Transactions on Management Information Systems (TMIS) - Special Issue on Informatics for Smart Health and Wellbeing

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although biomedical information available in articles and patents is increasing exponentially, we continue to rely on the same information retrieval methods and use very few keywords to search millions of documents. We are developing a fundamentally different approach for finding much more precise and complete information with a single query using predicates instead of keywords for both query and document representation. Predicates are triples that are more complex datastructures than keywords and contain more structured information. To make optimal use of them, we developed a new predicate-based vector space model and query-document similarity function with adjusted tf-idf and boost function. Using a test bed of 107,367 PubMed abstracts, we evaluated the first essential function: retrieving information. Cancer researchers provided 20 realistic queries, for which the top 15 abstracts were retrieved using a predicate-based (new) and keyword-based (baseline) approach. Each abstract was evaluated, double-blind, by cancer researchers on a 0-5 point scale to calculate precision (0 versus higher) and relevance (0-5 score). Precision was significantly higher (p