A Multi-Level Text Mining Method to Extract Biological Relationships
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
A Literature Based Method for Identifying Gene-Disease Connections
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Protein association discovery in biomedical literature
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Extracting the names of genes and gene products with a hidden Markov model
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Message Understanding Conference-6: a brief history
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Notions of correctness when evaluating protein name taggers
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Tuning support vector machines for biomedical named entity recognition
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Tagging gene and protein names in full text articles
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Two learning approaches for protein name extraction
Journal of Biomedical Informatics
Hi-index | 0.00 |
This paper proposes a method for identifying proteinnames in biomedical texts with an emphasis on detectingprotein name boundaries. We use a probabilistic modelwhich exploits several surface clues characterizing proteinnames and incorporates word classes for generalization.In contrast to previously proposed methods, our approachdoes not rely on natural language processing tools suchas part-of-speech taggers and syntactic parsers, so as toreduce processing overhead and the potential number ofprobabilistic parameters to be estimated. A notion of certaintyis also proposed to improve precision for identification.We implemented a protein name identification systembased on our proposed method, and evaluated the systemon real-world biomedical texts in conjunction with the previouswork. The results showed that overall our system performscomparably to the state-of-the-art protein name identificationsystem and that higher performance is achievedfor compound names. In addition, it is demonstrated thatour system can further improve precision by restricting thesystem output to those names with high certainties.