Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Distributional word clusters vs. words for text categorization
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Measures of semantic similarity and relatedness in the biomedical domain
Journal of Biomedical Informatics
A review of feature selection techniques in bioinformatics
Bioinformatics
Semi-structured document categorization with a semantic kernel
Pattern Recognition
A shared task involving multi-label classification of clinical free text
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Combined syntactic and semantic Kernels for text classification
ECIR'07 Proceedings of the 29th European conference on IR research
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Journal of Biomedical Informatics
The impact of conceptualization on text classification
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Conceptualization Effects on MEDLINE Documents Classification Using Rocchio Method
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Hi-index | 0.00 |
In this study we present novel feature engineering techniques that leverage the biomedical domain knowledge encoded in the Unified Medical Language System (UMLS) to improve machine-learning based clinical text classification. Critical steps in clinical text classification include identification of features and passages relevant to the classification task, and representation of clinical text to enable discrimination between documents of different classes. We developed novel information-theoretic techniques that utilize the taxonomical structure of the Unified Medical Language System (UMLS) to improve feature ranking, and we developed a semantic similarity measure that projects clinical text into a feature space that improves classification. We evaluated these methods on the 2008 Integrating Informatics with Biology and the Bedside (I2B2) obesity challenge. The methods we developed improve upon the results of this challenge's top machine-learning based system, and may improve the performance of other machine-learning based clinical text classification systems. We have released all tools developed as part of this study as open source, available at http://code.google.com/p/ytex.