An unsupervised method for extracting domain-specific affixes in biological literature

Authors:
Haibin Liu;Christian Blouin;Vlado Kešelj
Affiliations:
Dalhousie University, Canada;Dalhousie University, Canada;Dalhousie University, Canada
Venue:
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Year:
2007

Citing 7
Cited 1

PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Journal of the ACM (JACM)
Introduction to Machine Learning (Adaptive Computation and Machine Learning)

Introduction to Machine Learning (Adaptive Computation and Machine Learning)
Two-phase biomedical NE recognition based on SVMs

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Effective adaptation of a Hidden Markov Model-based named entity recognizer for biomedical domain

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Mining semantically related terms from biomedical literature

ACM Transactions on Asian Language Information Processing (TALIP)
Exploiting context for biomedical entity recognition: from syntax to the web

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Biomedical named entity recognition using conditional random fields and rich feature sets

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications

Sentence identification of biological interactions using PATRICIA tree generated patterns and genetic algorithm optimized parameters

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an unsupervised method to automatically extract domain-specific prefixes and suffixes from biological corpora based on the use of PATRICIA tree. The method is evaluated by integrating the extracted affixes into an existing learning-based biological term annotation system. The system based on our method achieves comparable experimental results to the original system in locating biological terms and exact term matching annotation. However, our method improves the system efficiency by significantly reducing the feature set size. Additionally, the method achieves a better performance with a small training data set. Since the affix extraction process is unsupervised, it is assumed that the method can be generalized to extract domain-specific affixes from other domains, thus assisting in domain-specific concept recognition.