Automatic scientific text classification using local patterns: KDD CUP 2002 (task 1)
ACM SIGKDD Explorations Newsletter
Rutabaga by any other name: extracting biological names
Journal of Biomedical Informatics - Special issue: Sublanguage
Gene name identification and normalization using a model organism database
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Term identification in the biomedical literature
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Gene name ambiguity of eukaryotic nomenclatures
Bioinformatics
Enhancing automatic term recognition through recognition of variation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Human gene name normalization using text matching with automatically extracted synonym dictionaries
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Medical entity recognition: a comparison of semantic and statistical methods
BioNLP '11 Proceedings of BioNLP 2011 Workshop
Hi-index | 0.00 |
In biomedical articles, terms often refer to different protein entities. For example, an arbitrary occurrence of term p53might denote thousands of proteins across a number of species. A human annotator is able to resolve this ambiguity relatively easily, by looking at its context and if necessary, by searching an appropriate protein database. However, this phenomenon may cause much trouble to a text mining system, which does not understand human languages and hence can not identify the correct protein that the term refers to. In this paper, we present a Term Identification system which automatically assigns unique identifiers, as found in a protein database, to ambiguous protein mentions in texts. Unlike other solutions described in literature, which only work on gene/protein mentions on a specific model organism, our system is able to tackle protein mentions across many species, by integrating a machine-learning based species tagger. We have compared the performance of our automatic system to that of human annotators, with very promising results.