Extracting and normalizing gene/protein mentions with the flexible and trainable moara java library

Authors:
Mariana L. Neves;José Maria Carazo;Alberto Pascual-Montano
Affiliations:
Biocomputing Unit, Centro Nacional de Biotecnología – CSIC, Madrid, Spain;Biocomputing Unit, Centro Nacional de Biotecnología – CSIC, Madrid, Spain;Biocomputing Unit, Centro Nacional de Biotecnología – CSIC, Madrid, Spain
Venue:
ISMB/ECCB'09 Proceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology
Year:
2009

Citing 9
Cited 0

Case-based reasoning: foundational issues, methodological variations, and system approaches

AI Communications
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text

Bioinformatics
BioThesaurus: a web-based thesaurus of protein and gene names

Bioinformatics
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Gene symbol disambiguation using knowledge-based profiles

Bioinformatics
Learning string similarity measures for gene/protein name dictionary look-up using logistic regression

Bioinformatics
Inter-species normalization of gene mentions with GNAT

Bioinformatics
CBR-Tagger: a case-based reasoning approach to the gene/protein mention problem

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
U-Compare

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Gene/protein recognition and normalization are important prerequisite steps for many biological text mining tasks. Even if great efforts have been dedicated to these problems and effective solutions have been reported, the availability of easily integrated tools to perform these tasks is still deficient. We therefore propose Moara, a Java library that implements gene/protein recognition and normalization steps based on machine learning approaches. The system may be trained with extra documents for the recognition procedure and new organism may be added in the normalization step. The novelty of the methodology used in Moara lies in the design of a system that is not tailored to a specific organism and therefore does not need any organism-dependent tuning in the algorithms and in the dictionaries it uses. Moara can be used either as a standalone application or incorporated in a text mining system and it is available at: http://moara.dacya.ucm.es