Validating candidate gene-mutation relations in MEDLINE abstracts via crowdsourcing

  • Authors:
  • John D. Burger;Emily Doughty;Sam Bayer;David Tresner-Kirsch;Ben Wellner;John Aberdeen;Kyungjoon Lee;Maricel G. Kann;Lynette Hirschman

  • Affiliations:
  • The MITRE Corporation, Bedford, MA;University of Maryland, Baltimore, MD;The MITRE Corporation, Bedford, MA;The MITRE Corporation, Bedford, MA;The MITRE Corporation, Bedford, MA;The MITRE Corporation, Bedford, MA;Harvard Medical School, Boston, MA;University of Maryland, Baltimore, MD;The MITRE Corporation, Bedford, MA

  • Venue:
  • DILS'12 Proceedings of the 8th international conference on Data Integration in the Life Sciences
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe an experiment to elicit judgments on the validity of gene-mutation relations in MEDLINE abstracts via crowdsourcing. The biomedical literature contains rich information on such relations, but the correct pairings are difficult to extract automatically because a single abstract may mention multiple genes and mutations. We ran an experiment presenting candidate gene-mutation relations as Amazon Mechanical Turk HITs (human intelligence tasks). We extracted candidate mutations from a corpus of 250 MEDLINE abstracts using EMU combined with curated gene lists from NCBI. The resulting document-level annotations were projected into the abstract text to highlight mentions of genes and mutations for review. Reviewers returned results within 36 hours. Initial weighted results evaluated against a gold standard of expert curated gene-mutation relations achieved 85% accuracy, with the best reviewer achieving 91% accuracy. We expect performance to increase with further experimentation, providing a scalable approach for rapid manual curation of important biological relations.