Validating candidate gene-mutation relations in MEDLINE abstracts via crowdsourcing

Authors:
John D. Burger;Emily Doughty;Sam Bayer;David Tresner-Kirsch;Ben Wellner;John Aberdeen;Kyungjoon Lee;Maricel G. Kann;Lynette Hirschman
Affiliations:
The MITRE Corporation, Bedford, MA;University of Maryland, Baltimore, MD;The MITRE Corporation, Bedford, MA;The MITRE Corporation, Bedford, MA;The MITRE Corporation, Bedford, MA;The MITRE Corporation, Bedford, MA;Harvard Medical School, Boston, MA;University of Maryland, Baltimore, MD;The MITRE Corporation, Bedford, MA
Venue:
DILS'12 Proceedings of the 8th international conference on Data Integration in the Life Sciences
Year:
2012

Citing 6
Cited 0

Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors

Bioinformatics
MutationFinder

Bioinformatics
PheWAS

Bioinformatics
Creating speech and language data with Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Preliminary experience with Amazon's Mechanical Turk for annotating medical named entities

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Toward an automatic method for extracting cancer-and other disease-related point mutations from the biomedical literature

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an experiment to elicit judgments on the validity of gene-mutation relations in MEDLINE abstracts via crowdsourcing. The biomedical literature contains rich information on such relations, but the correct pairings are difficult to extract automatically because a single abstract may mention multiple genes and mutations. We ran an experiment presenting candidate gene-mutation relations as Amazon Mechanical Turk HITs (human intelligence tasks). We extracted candidate mutations from a corpus of 250 MEDLINE abstracts using EMU combined with curated gene lists from NCBI. The resulting document-level annotations were projected into the abstract text to highlight mentions of genes and mutations for review. Reviewers returned results within 36 hours. Initial weighted results evaluated against a gold standard of expert curated gene-mutation relations achieved 85% accuracy, with the best reviewer achieving 91% accuracy. We expect performance to increase with further experimentation, providing a scalable approach for rapid manual curation of important biological relations.