An Efficient Digital Search Algorithm by Using a Double-Array Structure
IEEE Transactions on Software Engineering
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Extracting the names of genes and gene products with a hidden Markov model
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
GAPSCORE: finding gene and protein names one word at a time
Bioinformatics
Tuning support vector machines for biomedical named entity recognition
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Two-phase biomedical NE recognition based on SVMs
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Protein name tagging for biomedical annotation in text
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Improving the scalability of semi-Markov conditional random fields for named entity recognition
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Experimental Study on a Two Phase Method for Biomedical Named Entity Recognition
IEICE - Transactions on Information and Systems
Introduction to the bio-entity recognition task at JNLPBA
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Incorporating lexical knowledge into biomedical NE recognition
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploiting context for biomedical entity recognition: from syntax to the web
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Adapting an NER-system for German to the biomedical domain
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Exploring deep knowledge resources in biomedical name recognition
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
POSBIOTM-NER in the shared task of BioNLP/NLPBA 2004
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Biomedical named entity recognition using conditional random fields and rich feature sets
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Expert Systems with Applications: An International Journal
SimSem: fast approximate string matching in relation to semantic category disambiguation
BioNLP '11 Proceedings of BioNLP 2011 Workshop
Adding text mining workflows as web services to the BioCatalogue
Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences
ACM Transactions on Management Information Systems (TMIS) - Special Issue on Informatics for Smart Health and Wellbeing
Hi-index | 0.00 |
When term ambiguity and variability are very high, dictionary-based Named Entity Recognition (NER) is not an ideal solution even though large-scale terminological resources are available. Many researches on statistical NER have tried to cope with these problems. However, it is not straightforward how to exploit existing and additional Named Entity (NE) dictionaries in statistical NER. Presumably, addition of NEs to an NE dictionary leads to better performance. However, in reality, the retraining of NER models is required to achieve this. We have established a novel way to improve the NER performance by addition of NEs to an NE dictionary without retraining. We chose protein name recognition as a case study because it most suffers the problems related to heavy term variation and ambiguity. In our approach, first, known NEs are identified in parallel with Part-of-Speech (POS) tagging based on a general word dictionary and an NE dictionary. Then, statistical NER is trained on the tagger outputs with correct NE labels attached. We evaluated performance of our NER on the standard JNLPBA-2004 data set. The F-score on the test set has been improved from 73.14 to 73.78 after adding the protein names appearing in the training data to the POS tagger dictionary without any model retraining. The performance further increased to 78.72 after enriching the tagging dictionary with test set protein names. Our approach has demonstrated high performance in protein name recognition, which indicates how to make the most of known NEs in statistical NER.