Exploring label dependency in active learning for phenotype mapping

  • Authors:
  • Shefali Sharma;Leslie Lange;Jose Luis Ambite;Yigal Arens;Chun-Nan Hsu

  • Affiliations:
  • University of Southern California, Marina del Rey, CA;University of North Carolina, Chaple Hills, NC;University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA and Institute of Information Sciences, Academia Sinica, Taipei, Taiwan

  • Venue:
  • BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many genetic epidemiological studies of human diseases have multiple variables related to any given phenotype, resulting from different definitions and multiple measurements or subsets of data. Manually mapping and harmonizing these phenotypes is a time-consuming process that may still miss the most appropriate variables. Previously, a supervised learning algorithm was proposed for this problem. That algorithm learns to determine whether a pair of phenotypes is in the same class. Though that algorithm accomplished satisfying F-scores, the need to manually label training examples becomes a bottleneck to improve its coverage. Herein we present a novel active learning solution to solve this challenging phenotype-mapping problem. Active learning will make phenotype mapping more efficient and improve its accuracy.