SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Experiments in high-dimensional text categorization
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
SNoW User Guide
A Linear Least Squares Fit mapping method for information retrieval from natural language texts
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A decision-tree-based symbolic rule induction system for text categorization
IBM Systems Journal
Integrating data mining with case-based reasoning for chronic diseases prognosis and diagnosis
Expert Systems with Applications: An International Journal
Towards a framework for developing semantic relatedness reference standards
Journal of Biomedical Informatics
Relevance ranking of intensive care nursing narratives
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Hi-index | 0.00 |
This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Naïve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Naïve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.