Identifying chemical names in biomedical text: an investigation of the substring co-occurrence based approaches

Authors:
Alexander Vasserman
Affiliations:
University of Pennsylvania, Philadelphia, PA
Venue:
HLT-SRWS '04 Proceedings of the Student Research Workshop at HLT-NAACL 2004
Year:
2004

Citing 5
Cited 4

Elements of information theory

Elements of information theory
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
A statistical profile of the Named Entity task

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Cascaded classifiers for confidence-based chemical named entity recognition

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Annotation of chemical named entities

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents

ACM Transactions on Information Systems (TOIS)
High-Throughput identification of chemistry in life science texts

CompLife'06 Proceedings of the Second international conference on Computational Life Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate various strategies for finding chemicals in biomedical text using substring co-occurrence information. The goal is to build a system from readily available data with minimal human involvement. Our models are trained from a dictionary of chemical names and general biomedical text. We investigated several strategies including Naïve Bayes classifiers and several types of N-gram models. We introduced a new way of interpolating N-grams that does not require tuning any parameters. We also found the task to be similar to Language Identification.