Extracting and normalizing gene/protein mentions with the flexible and trainable moara java library

  • Authors:
  • Mariana L. Neves;José Maria Carazo;Alberto Pascual-Montano

  • Affiliations:
  • Biocomputing Unit, Centro Nacional de Biotecnología – CSIC, Madrid, Spain;Biocomputing Unit, Centro Nacional de Biotecnología – CSIC, Madrid, Spain;Biocomputing Unit, Centro Nacional de Biotecnología – CSIC, Madrid, Spain

  • Venue:
  • ISMB/ECCB'09 Proceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Gene/protein recognition and normalization are important prerequisite steps for many biological text mining tasks. Even if great efforts have been dedicated to these problems and effective solutions have been reported, the availability of easily integrated tools to perform these tasks is still deficient. We therefore propose Moara, a Java library that implements gene/protein recognition and normalization steps based on machine learning approaches. The system may be trained with extra documents for the recognition procedure and new organism may be added in the normalization step. The novelty of the methodology used in Moara lies in the design of a system that is not tailored to a specific organism and therefore does not need any organism-dependent tuning in the algorithms and in the dictionaries it uses. Moara can be used either as a standalone application or incorporated in a text mining system and it is available at: http://moara.dacya.ucm.es