The nature of statistical learning theory
The nature of statistical learning theory
A maximum entropy approach to natural language processing
Computational Linguistics
IEEE Transactions on Pattern Analysis and Machine Intelligence
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Tuning support vector machines for biomedical named entity recognition
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Use of support vector machines in extended named entity recognition
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Two-phase biomedical NE recognition based on SVMs
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
The GENIA corpus: an annotated research abstract corpus in molecular biology domain
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Introduction: named entity recognition in biomedicine
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Semantic retrieval for the accurate identification of relational concepts in massive textbases
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
COLING-ACL '06 Proceedings of the COLING/ACL on Interactive presentation sessions
Journal of Biomedical Informatics
Efficient approximate entity extraction with edit distance constraints
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
@Note: A workbench for Biomedical Text Mining
Journal of Biomedical Informatics
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Unsupervised gene/protein named entity normalization using automatically extracted dictionaries
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Two learning approaches for protein name extraction
Journal of Biomedical Informatics
Discovering genes-diseases associations from specialized literature using the grid
IEEE Transactions on Information Technology in Biomedicine - Special section on biomedical informatics
Classifier subset selection for biomedical named entity recognition
Applied Intelligence
Graph-based concept identification and disambiguation for enterprise search
Proceedings of the 19th international conference on World wide web
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Headwords and suffixes in biomedical names
KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Hi-index | 0.00 |
Dictionary-based protein name recognition is often a first step in extracting information from biomedical documents because it can provide ID information on recognized terms. However, dictionary-based approaches present two fundamental difficulties: (1) false recognition mainly caused by short names; (2) low recall due to spelling variations. In this paper, we tackle the former problem using machine learning to filter out false positives and present two alternative methods for alleviating the latter problem of spelling variations. The first is achieved by using approximate string searching, and the second by expanding the dictionary with a probabilistic variant generator, which we propose in this paper. Experimental results using the GENIA corpus revealed that filtering using a naive Bayes classifier greatly improved precision with only a slight loss of recall, resulting in 10.8% improvement in F-measure, and dictionary expansion with the variant generator gave further 1.6% improvement and achieved an F-measure of 66.6%.