Comparison of character-level and part of speech features for name recognition in biomedical texts
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Using name-internal and contextual features to classify biological terms
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Literature Extraction of Protein Functions Using Sentence Pattern Mining
IEEE Transactions on Knowledge and Data Engineering
ME-based biomedical named entity recognition using lexical knowledge
ACM Transactions on Asian Language Information Processing (TALIP)
Role of local context in automatic deidentification of ungrammatical, fragmented text
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A hybrid approach to biomedical named entity recognition and semantic role labeling
NAACL-DocConsortium '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: doctoral consortium
International Journal of Bioinformatics Research and Applications
Challenges in biological literature mining for online discovery of molecular interaction pathways
International Journal of Computer Applications in Technology
Vote-Based Classifier Selection for Biomedical NER Using Genetic Algorithms
IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part II
Ontology-centric integration and navigation of the dengue literature
Journal of Biomedical Informatics
Experimental Study on a Two Phase Method for Biomedical Named Entity Recognition
IEICE - Transactions on Information and Systems
Named entity recognition in biomedical texts using an HMM model
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Recognizing nested named entities in GENIA corpus
BioNLP '06 Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis
Recognising nested named entities in biomedical text
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Online assessment of content skill levels for medical texts
Expert Systems with Applications: An International Journal
Unsupervised gene/protein named entity normalization using automatically extracted dictionaries
ISMB '05 Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics
Database Note: iProLINK: an integrated protein resource for literature mining
Computational Biology and Chemistry
Two learning approaches for protein name extraction
Journal of Biomedical Informatics
Context-based online medical terminology navigation
Expert Systems with Applications: An International Journal
Recognizing nested named entities in GENIA corpus
LNLBioNLP '06 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology
Classifier subset selection for biomedical named entity recognition
Applied Intelligence
Annotating and recognising named entities in clinical notes
ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Nested named entity recognition
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Expert Systems with Applications: An International Journal
MaxMatcher: biological concept extraction using approximate dictionary lookup
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Enhancing biomedical named entity classification using terabyte unlabeled data
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Recognizing biomedical named entities in Chinese research abstracts
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Louhi '10 Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents
Recognizing biomedical named entities using skip-chain conditional random fields
BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Identifying disease diagnosis factors by proximity-based mining of medical texts
ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part II
Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Generating links to background knowledge: a case study using narrative radiology reports
Proceedings of the 20th ACM international conference on Information and knowledge management
Headwords and suffixes in biomedical names
KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Various features with integrated strategies for protein name classification
ISPA'05 Proceedings of the 2005 international conference on Parallel and Distributed Processing and Applications
Empirical textual mining to protein entities recognition from pubmed corpus
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Incremental maintenance of biological databases using association rule mining
PRIB'06 Proceedings of the 2006 international conference on Pattern Recognition in Bioinformatics
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Combining information extraction and text mining for cancer biomarker detection
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 3.84 |
Motivation: With an overwhelming amount of textual information in molecular biology and biomedicine, there is a need for effective and efficient literature mining and knowledge discovery that can help biologists to gather and make use of the knowledge encoded in text documents. In order to make organized and structured information available, automatically recognizing biomedical entity names becomes critical and is important for information retrieval, information extraction and automated knowledge acquisition. Results: In this paper, we present a named entity recognition system in the biomedical domain, called PowerBioNE. In order to deal with the special phenomena of naming conventions in the biomedical domain, we propose various evidential features: (1) word formation pattern; (2) morphological pattern, such as prefix and suffix; (3) part-of-speech; (4) head noun trigger; (5) special verb trigger and (6) name alias feature. All the features are integrated effectively and efficiently through a hidden Markov model (HMM) and a HMM-based named entity recognizer. In addition, a k-Nearest Neighbor (k-NN) algorithm is proposed to resolve the data sparseness problem in our system. Finally, we present a pattern-based post-processing to automatically extract rules from the training data to deal with the cascaded entity name phenomenon. From our best knowledge, PowerBioNE is the first system which deals with the cascaded entity name phenomenon. Evaluation shows that our system achieves the F-measure of 66.6 and 62.2 on the 23 classes of GENIA V3.0 and V1.1, respectively. In particular, our system achieves the F-measure of 75.8 on the 'protein' class of GENIA V3.0. For comparison, our system outperforms the best published result by 7.8 on GENIA V1.1, without help of any dictionaries. It also shows that our HMM and the k-NN algorithm outperform other models, such as back-off HMM, linear interpolated HMM, support vector machines, C4.5, C4.5 rules and RIPPER, by effectively capturing the local context dependency and resolving the data sparseness problem. Moreover, evaluation on GENIA V3.0 shows that the post-processing for the cascaded entity name phenomenon improves the F-measure by 3.9. Finally, error analysis shows that about half of the errors are caused by the strict annotation scheme and the annotation inconsistency in the GENIA corpus. This suggests that our system achieves an acceptable F-measure of 83.6 on the 23 classes of GENIA V3.0 and in particular 86.2 on the 'protein' class, without help of any dictionaries. We think that a F-measure of 90 on the 23 classes of GENIA V3.0 and in particular 92 on the 'protein' class, can be achieved through refining of the annotation scheme in the GENIA corpus, such as flexible annotation scheme and annotation consistency, and inclusion of a reasonable biomedical dictionary. Availability: A demo system is available at http://textmining.i2r.a-star.edu.sg/NLS/demo.htm. Technology license is available upon the bilateral agreement.