Bio-medical entity extraction using support vector machines

Authors:
Koichi Takeuchi;Nigel Collier
Affiliations:
Okayama University, 3-1-1 Tsushima-naka, Okayama-shi, Okayama 700-8530, Japan;National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
Venue:
Artificial Intelligence in Medicine
Year:
2005

Citing 16
Cited 11

A translation approach to portable ontology specifications

Knowledge Acquisition - Special issue: Current issues in knowledge modeling
The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Making large-scale support vector machine learning practical

Advances in kernel methods
Pairwise classification and support vector machines

Advances in kernel methods
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Choosing Multiple Parameters for Support Vector Machines

Machine Learning
Constructing Biological Knowledge Bases by Extracting Information from Text Sources

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
A non-projective dependency parser

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
Use of support vector machines in extended named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Comparison between tagged corpora for the named entity task

CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
Building an annotated corpus in the molecular-biology domain

Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content

The development of a schema for semantic annotation: Gain brought by a formal ontological method

Applied Ontology - Biomedical Ontology in Action
Two learning approaches for protein name extraction

Journal of Biomedical Informatics
Effects of discretization on determination of coronary artery disease using support vector machine

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Effects of principle component analysis on assessment of coronary artery diseases using support vector machine

Expert Systems with Applications: An International Journal
Classifier subset selection for biomedical named entity recognition

Applied Intelligence
A comparison of feature selection models utilizing binary particle swarm optimization and genetic algorithm in determining coronary artery disease using support vector machine

Expert Systems with Applications: An International Journal
Protein interaction detection in sentences via Gaussian Processes: a preliminary evaluation

International Journal of Data Mining and Bioinformatics
Identifying disease diagnosis factors by proximity-based mining of medical texts

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part II
Compositional information extraction methodology from medical reports

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Generating links to background knowledge: a case study using narrative radiology reports

Proceedings of the 20th ACM international conference on Information and knowledge management
Accurate Prediction of Coronary Artery Disease Using Reliable Diagnosis System

Journal of Medical Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective:: Support vector machines (SVMs) have achieved state-of-the-art performance in several classification tasks. In this article we apply them to the identification and semantic annotation of scientific and technical terminology in the domain of molecular biology. This illustrates the extensibility of the traditional named entity task to special domains with large-scale terminologies such as those in medicine and related disciplines. Methods and materials:: The foundation for the model is a sample of text annotated by a domain expert according to an ontology of concepts, properties and relations. The model then learns to annotate unseen terms in new texts and contexts. The results can be used for a variety of intelligent language processing applications. We illustrate SVMs capabilities using a sample of 100 journal abstracts texts taken from the {human, blood cell, transcription factor} domain of MEDLINE. Results:: Approximately 3400 terms are annotated and the model performs at about 74% F-score on cross-validation tests. A detailed analysis based on empirical evidence shows the contribution of various feature sets to performance. Conclusion:: Our experiments indicate a relationship between feature window size and the amount of training data and that a combination of surface words, orthographic features and head noun features achieve the best performance among the feature sets tested.