Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
MedPost: a part-of-speech tagger for bioMedical text
Bioinformatics
Text Mining for Biology And Biomedicine
Text Mining for Biology And Biomedicine
Hi-index | 0.00 |
In this paper, we propose an integrated Gene normaliser (GNer) to assign a unique database identifier for each recognised gene mention in biological literature. The GNer combines Support Vector Machines (SVMs) and some rule-base components. First, we construct a dictionary from EntrezGene and BioThesaurus. Then we reduce variations and ambiguities of synonyms based on a designed pre-processor. Finally, a SVM-based disambiguation filter is developed to eliminate the ambiguity of exact matching. From the experimental results, the proposed GNer can achieve a fairly good performance, which can achieve the precision 80.5%, the recall 86.4% and the Fβ>1 measure 83.4.