A maximum entropy approach to natural language processing
Computational Linguistics
A Machine Learning Approach to POS Tagging
Machine Learning
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology
Artificial Intelligence Methods and Tools for Systems Biology
Artificial Intelligence Methods and Tools for Systems Biology
KSPF: using gene sequence patterns and data mining for biological knowledge management
Expert Systems with Applications: An International Journal
A hybrid approach to biomedical named entity recognition and semantic role labeling
NAACL-DocConsortium '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: doctoral consortium
Rich features based Conditional Random Fields for biological named entities recognition
Computers in Biology and Medicine
Learning weights for translation candidates in Japanese-Chinese information retrieval
Expert Systems with Applications: An International Journal
How to make the most of NE dictionaries in statistical NER
BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Feature selection techniques for maximum entropy based biomedical named entity recognition
Journal of Biomedical Informatics
ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
A composite kernel for named entity recognition
Pattern Recognition Letters
A hybrid named entity recognizer for Turkish
Expert Systems with Applications: An International Journal
Biomedical named entities recognition using conditional random fields model
FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
Methodological Review: Biomedical text mining and its applications in cancer research
Journal of Biomedical Informatics
Hi-index | 12.05 |
As new high-throughput technologies have created an explosion of biomedical literature, there arises a pressing need for automatic information extraction from the literature bank. To this end, biomedical named entity recognition (NER) from natural language text is indispensable. Current NER approaches include: dictionary based, rule based, or machine learning based. Since, there is no consolidated nomenclature for most biomedical NEs, any NER system relying on limited dictionaries or rules does not seem to perform satisfactorily. In this paper, we consider a machine learning model, CRF, for the construction of our NER framework. CRF is a well-known model for solving other sequence tagging problems. In our framework, we do our best to utilize available resources including dictionaries, web corpora, and lexical analyzers, and represent them as linguistic features in the CRF model. In the experiment on the JNLPBA 2004 data, with minimal post-processing, our system achieves an F-score of 70.2%, which is better than most state-of-the-art systems. On the GENIA 3.02 corpus, our system achieves an F-score of 78.4% for protein names, which is 2.8% higher than the next-best system. In addition, we also examine the usefulness of each feature in our CRF model. Our experience could be valuable to other researchers working on machine learning based NER.