Algorithm for grounding mutation mentions from text to protein sequences

Authors:
Jonas Bergman Laurila;Rajaraman Kanagasabai;Christopher J. O. Baker
Affiliations:
University of New Brunswick, Saint John, New Brunswick, Canada;Institute for Infocomm Research, Singapore;University of New Brunswick, Saint John, New Brunswick, Canada
Venue:
DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
Year:
2010

Citing 4
Cited 0

Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors

Bioinformatics
Mutation Mining--A Prospector's Tale

Information Systems Frontiers
MutationFinder

Bioinformatics
Enhanced semantic access to the protein engineering literature using ontologies populated by text mining

International Journal of Bioinformatics Research and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Protein mutations derived from in vitro experimental analysis are described in detail in scientific papers. Reuse of mutation impact annotations is an important subfield of bioinformatics for which mutation grounding is a critical step. Presented here is a method for grounding of textual mentions from papers describing mutational changes to proteins. We distinguish between grounding of mutation entities to protein database identifiers and to the correct positions on sequences extracted from protein databases. The grounding workflow coordinates the extraction of mutation, protein and organism mentions from texts and uses these to identify target sequences. Mutation mentions are sequentially mapped onto candidate proteins to facilitate their correct grounding to a protein sequence, independent of a protein-mutation tuple extraction task. Using a gold standard corpus of full text articles and corresponding protein sequences we show high performance precision and recall and discuss novel aspects of the algorithm in the context of previous work.