Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Gene symbol disambiguation using knowledge-based profiles
Bioinformatics
Inter-species normalization of gene mentions with GNAT
Bioinformatics
CBR-Tagger: a case-based reasoning approach to the gene/protein mention problem
BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Bioinformatics
Hi-index | 0.00 |
Gene/protein recognition and normalization are important prerequisite steps for many biological text mining tasks. Even if great efforts have been dedicated to these problems and effective solutions have been reported, the availability of easily integrated tools to perform these tasks is still deficient. We therefore propose Moara, a Java library that implements gene/protein recognition and normalization steps based on machine learning approaches. The system may be trained with extra documents for the recognition procedure and new organism may be added in the normalization step. The novelty of the methodology used in Moara lies in the design of a system that is not tailored to a specific organism and therefore does not need any organism-dependent tuning in the algorithms and in the dictionaries it uses. Moara can be used either as a standalone application or incorporated in a text mining system and it is available at: http://moara.dacya.ucm.es