Two learning approaches for protein name extraction

Authors:
Serhan Tatar;Ilyas Cicekli
Affiliations:
Department of Computer Engineering, Bilkent University, 06800 Bilkent, Ankara, Turkey;Department of Computer Engineering, Bilkent University, 06800 Bilkent, Ankara, Turkey
Venue:
Journal of Biomedical Informatics
Year:
2009

Citing 27
Cited 2

Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
The nature of statistical learning theory

The nature of statistical learning theory
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
A maximum entropy approach to natural language processing

Computational Linguistics
Relational learning of pattern-match rules for information extraction

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Information Extraction: Techniques and Challenges

SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Boosted Wrapper Induction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A Probabilistic Model for Identifying Protein Names and their Name Boundaries

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Automatically identifying gene/protein terms in MEDLINE abstracts

Journal of Biomedical Informatics
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Extracting the names of genes and gene products with a hidden Markov model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Enhancing performance of protein and gene name recognizers with filtering and integration strategies

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Improving the performance of dictionary-based approaches in protein name recognition

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Term identification in the biomedical literature

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
A hybrid approach to protein name identification in biomedical texts

Information Processing and Management: an International Journal
Recognizing names in biomedical texts: a machine learning approach

Bioinformatics
Protein names precisely peeled off free text

Bioinformatics
High-recall protein entity recognition using a dictionary

Bioinformatics
Tuning support vector machines for biomedical named entity recognition

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Boosting precision and recall of dictionary-based protein name recognition

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Protein name tagging for biomedical annotation in text

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Generalizing predicates with string arguments

Applied Intelligence
Evaluation of techniques for increasing recall in a dictionary approach to gene and protein name identification

Journal of Biomedical Informatics
Methodological Review: Extracting interactions between proteins from the literature

Journal of Biomedical Informatics
Representing sentence structure in hidden Markov models for information extraction

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Comparative experiments on learning information extractors for proteins and their interactions

Artificial Intelligence in Medicine
Bio-medical entity extraction using support vector machines

Artificial Intelligence in Medicine

Recognizing biomedical named entities using skip-chain conditional random fields

BioNLP '10 Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
Automated crime report analysis and classification for e-government and decision support

Proceedings of the 14th Annual International Conference on Digital Government Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Protein name extraction, one of the basic tasks in automatic extraction of information from biological texts, remains challenging. In this paper, we explore the use of two different machine learning techniques and present the results of the conducted experiments. In the first method, Bigram language model is used to extract protein names. In the latter, we use an automatic rule learning method that can identify protein names located in the biological texts. In both cases, we generalize protein names by using hierarchically categorized syntactic token types. We conducted our experiments on two different datasets. Our first method based on Bigram language model achieved an F-score of 67.7% on the YAPEX dataset and 66.8% on the GENIA corpus. The developed rule learning method obtained 61.8% F-score value on the YAPEX dataset and 61.0% on the GENIA corpus. The results of the comparative experiments demonstrate that both techniques are applicable to the task of automatic protein name extraction, a prerequisite for the large-scale processing of biomedical literature.