Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Biomedical named entity recognition using two-phase model based on SVMs
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Tuning support vector machines for biomedical named entity recognition
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Computational Biology and Chemistry
A skip-chain conditional random field for ranking meeting utterances by importance
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Two learning approaches for protein name extraction
Journal of Biomedical Informatics
Hi-index | 0.01 |
Linear-chain Conditional Random Fields (CRF) has been applied to perform the Named Entity Recognition (NER) task in many biomedical text mining and information extraction systems. However, the linear-chain CRF cannot capture long distance dependency, which is very common in the biomedical literature. In this paper, we propose a novel study of capturing such long distance dependency by defining two principles of constructing skip-edges for a skip-chain CRF: linking similar words and linking words having typed dependencies. The approach is applied to recognize gene/protein mentions in the literature. When tested on the BioCreAtIvE II Gene Mention dataset and GENIA corpus, the approach contributes significant improvements over the linear-chain CRF. We also present in-depth error analysis on inconsistent labeling and study the influence of the quality of skip edges on the labeling performance.