Ontology-guided feature engineering for clinical text classification

Authors:
Vijay N. Garla;Cynthia Brandt
Affiliations:
Interdepartmental Program in Computational Biology & Bioinformatics, Yale University, 300 George Street, Suite 501, New Haven, CT 06520-8009, United States;Connecticut VA Healthcare System, Bldg. 35A, Room 213 (11-ACSLG), 950 Campbell Avenue, West Haven, CT 06516, United States and Yale Center for Medical Informatics, Yale University, 300 George Stre ...
Venue:
Journal of Biomedical Informatics
Year:
2012

Citing 12
Cited 2

Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Distributional word clusters vs. words for text categorization

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Measures of semantic similarity and relatedness in the biomedical domain

Journal of Biomedical Informatics
A review of feature selection techniques in bioinformatics

Bioinformatics
Semi-structured document categorization with a semantic kernel

Pattern Recognition
A shared task involving multi-label classification of clinical free text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Combined syntactic and semantic Kernels for text classification

ECIR'07 Proceedings of the 29th European conference on IR research
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective

Journal of Biomedical Informatics

The impact of conceptualization on text classification

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Conceptualization Effects on MEDLINE Documents Classification Using Rocchio Method

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this study we present novel feature engineering techniques that leverage the biomedical domain knowledge encoded in the Unified Medical Language System (UMLS) to improve machine-learning based clinical text classification. Critical steps in clinical text classification include identification of features and passages relevant to the classification task, and representation of clinical text to enable discrimination between documents of different classes. We developed novel information-theoretic techniques that utilize the taxonomical structure of the Unified Medical Language System (UMLS) to improve feature ranking, and we developed a semantic similarity measure that projects clinical text into a feature space that improves classification. We evaluated these methods on the 2008 Integrating Informatics with Biology and the Bedside (I2B2) obesity challenge. The methods we developed improve upon the results of this challenge's top machine-learning based system, and may improve the performance of other machine-learning based clinical text classification systems. We have released all tools developed as part of this study as open source, available at http://code.google.com/p/ytex.