Extracting the names of genes and gene products with a hidden Markov model

Authors:
Nigel Collier;Chikashi Nobata;Jun-ichi Tsujii
Affiliations:
University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 5
Cited 67

Constructing Biological Knowledge Bases by Extracting Information from Text Sources

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
The GENIA project: corpus-based knowledge acquisition and information extraction from genome research papers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
MUC-5 evaluation metrics

MUC5 '93 Proceedings of the 5th conference on Message understanding

Sentence Filtering for Information Extraction in Genomics, a Classification Problem

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
A Multi-Level Text Mining Method to Extract Biological Relationships

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Rutabaga by any other name: extracting biological names

Journal of Biomedical Informatics - Special issue: Sublanguage
A Probabilistic Model for Identifying Protein Names and their Name Boundaries

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Identifying Gene and Protein Names from Biological Texts

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Automatically identifying gene/protein terms in MEDLINE abstracts

Journal of Biomedical Informatics
Information extraction from biomedical literature: methodology, evaluation and an application

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
An architecture for biological information extraction and representation

Proceedings of the 2004 ACM symposium on Applied computing
BioMap: toward the development of a knowledge base of biomedical literature

Proceedings of the 2004 ACM symposium on Applied computing
PathwayFinder: paving the way towards automatic pathway extraction

APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data

Journal of Biomedical Informatics
A text-mining system for knowledge discovery from biomedical documents

IBM Systems Journal
Gene name identification and normalization using a model organism database

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Enhancing HMM-based biomedical named entity recognition by studying special phenomena

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Comparison of character-level and part of speech features for name recognition in biomedical texts

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Biomedical named entity recognition using two-phase model based on SVMs

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Enhancing performance of protein and gene name recognizers with filtering and integration strategies

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Using name-internal and contextual features to classify biological terms

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Term identification in the biomedical literature

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Notions of correctness when evaluating protein name taggers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Generic NLP technologies: language, knowledge and information extraction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Enhancing a biomedical information extraction system with dictionary mining and context disambiguation

IBM Journal of Research and Development
A hybrid approach to protein name identification in biomedical texts

Information Processing and Management: an International Journal
Comparison between tagged corpora for the named entity task

WCC '00 Proceedings of the workshop on Comparing corpora - Volume 9
Tuning support vector machines for biomedical named entity recognition

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Tagging gene and protein names in full text articles

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Utilizing text mining results: the PastaWeb system

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Use of support vector machines in extended named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Gene name extraction using FlyBase resources

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Enhancing performance of protein name recognizers using collocation

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Two-phase biomedical NE recognition based on SVMs

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Effective adaptation of a Hidden Markov Model-based named entity recognizer for biomedical domain

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Protein name tagging for biomedical annotation in text

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Terminology-based knowledge mining for new knowledge discovery

ACM Transactions on Asian Language Information Processing (TALIP)
Simple algorithms for complex relation extraction with applications to biomedical IE

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Role of local context in automatic deidentification of ungrammatical, fragmented text

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Biological relation extraction and query answering from MEDLINE abstracts using ontology-based text mining

Data & Knowledge Engineering
The GENIA corpus: an annotated research abstract corpus in molecular biology domain

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Discovering semantic biomedical relations utilizing the Web

ACM Transactions on Knowledge Discovery from Data (TKDD)
Experimental Study on a Two Phase Method for Biomedical Named Entity Recognition

IEICE - Transactions on Information and Systems
Using argumentation to retrieve articles with similar citations from MEDLINE

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Annotating multiple types of biomedical entities: a single word classification approach

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Adapting an NER-system for German to the biomedical domain

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Biomedical named entity recognition using conditional random fields and rich feature sets

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
How to make the most of NE dictionaries in statistical NER

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
BioNoculars: extracting protein-protein interactions from biomedical text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Comparison between tagged corpora for the named entity task

CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
Feature selection techniques for maximum entropy based biomedical named entity recognition

Journal of Biomedical Informatics
Comparative experiments on learning information extractors for proteins and their interactions

Artificial Intelligence in Medicine
Two learning approaches for protein name extraction

Journal of Biomedical Informatics
MaxMatcher: biological concept extraction using approximate dictionary lookup

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
DEEPER: a full parsing based approach to protein relation extraction

EvoBIO'08 Proceedings of the 6th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
A composite kernel for named entity recognition

Pattern Recognition Letters
Recognizing medication related entities in hospital discharge summaries using support vector machine

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Methodological Review: Natural Language Processing methods and systems for biomedical ontology learning

Journal of Biomedical Informatics
Biomedical concept extraction based on combining the content-based and word order similarities

Proceedings of the 2011 ACM Symposium on Applied Computing
Generating links to background knowledge: a case study using narrative radiology reports

Proceedings of the 20th ACM international conference on Information and knowledge management
Biomedical literature mining for text classification and construction of gene networks

SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech

Speech Communication
An ontology-based pattern mining system for extracting information from biological texts

RSFDGrC'05 Proceedings of the 10th international conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume Part II
Two-phase biomedical named entity recognition using a hybrid method

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Exploring predicate-argument relations for named entity recognition in the molecular biology domain

DS'05 Proceedings of the 8th international conference on Discovery Science
KXtractor: an effective biomedical information extraction technique based on mixture hidden markov models

Transactions on Computational Systems Biology II
Identification of related gene/protein names based on an HMM of name variations

Computational Biology and Chemistry
A framework for biological event extraction from text

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Lexicon-free and context-free drug names identification methods using hidden markov models and pointwise mutual information

Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics
Topic-Oriented words as features for named entity recognition

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We report the results of a study into the use of a linear interpolating hidden Markov model (HMM) for the task of extracting technical terminology from MEDLINE abstracts and texts in the molecular-biology domain. This is the first stage in a system that will extract event information for automatically updating biology databases. We trained the HMM entirely with bigrams based on lexical and character features in a relatively small corpus of 100 MEDLINE abstracts that were marked-up by domain experts with term classes such as proteins and DNA. Using cross-validation methods we achieved an F-score of 0.73 and we examine the contribution made by each part of the interpolation model to overcoming data sparseness.