Empirical textual mining to protein entities recognition from pubmed corpus

Authors:
Tyne Liang;Ping-Ke Shih
Affiliations:
Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan;Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan
Venue:
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Year:
2005

Citing 8
Cited 2

Notions of correctness when evaluating protein name taggers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Recognizing names in biomedical texts: a machine learning approach

Bioinformatics
Enhancing performance of protein name recognizers using collocation

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Two-phase biomedical NE recognition based on SVMs

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Boosting precision and recall of dictionary-based protein name recognition

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Bio-medical entity extraction using Support Vector Machines

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Biomedical named entity recognition using conditional random fields and rich feature sets

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications

An NLP-based ontology population for a risk management generic structure

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Medical entity recognition: a comparison of semantic and statistical methods

BioNLP '11 Proceedings of BioNLP 2011 Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named Entity Recognition (NER) from biomedical literature is crucial in biomedical knowledge base automation. In this paper, both empirical rule and statistical approaches to protein entity recognition are presented and investigated on a general corpus GENIA 3.02p and a new domain-specific corpus SRC. Experimental results show the rules derived from SRC are useful though they are simpler and more general than the one used by other rule-based approaches. Meanwhile, a concise HMM-based model with rich set of features is presented and proved to be robust and competitive while comparing it to other successful hybrid models. Besides, the resolution of coordination variants common in entities recognition is addressed. By applying heuristic rules and clustering strategy, the presented resolver is proved to be feasible.