Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier

Authors:
Serguei V. Pakhomov;James Buntrock;Christopher G. Chute
Affiliations:
Division of Biomedical Informatics, Mayo Clinic College of Medicine, SW, Rochester, MN;Division of Biomedical Informatics, Mayo Clinic College of Medicine, SW, Rochester, MN;Division of Biomedical Informatics, Mayo Clinic College of Medicine, SW, Rochester, MN
Venue:
Journal of Biomedical Informatics
Year:
2005

Citing 7
Cited 3

Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Experiments in high-dimensional text categorization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
SNoW User Guide

SNoW User Guide
A Linear Least Squares Fit mapping method for information retrieval from natural language texts

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A decision-tree-based symbolic rule induction system for text categorization

IBM Systems Journal

Integrating data mining with case-based reasoning for chronic diseases prognosis and diagnosis

Expert Systems with Applications: An International Journal
Towards a framework for developing semantic relatedness reference standards

Journal of Biomedical Informatics
Relevance ranking of intensive care nursing narratives

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses a very specific problem of identifying patients diagnosed with a specific condition for potential recruitment in a clinical trial or an epidemiological study. We present a simple machine learning method for identifying patients diagnosed with congestive heart failure and other related conditions by automatically classifying clinical notes dictated at Mayo Clinic. This method relies on an automatic classifier trained on comparable amounts of positive and negative samples of clinical notes previously categorized by human experts. The documents are represented as feature vectors, where features are a mix of demographic information as well as single words and concept mappings to MeSH and HICDA classification systems. We compare two simple and efficient classification algorithms (Naïve Bayes and Perceptron) and a baseline term spotting method with respect to their accuracy and recall on positive samples. Depending on the test set, we find that Naïve Bayes yields better recall on positive samples (95 vs. 86%) but worse accuracy than Perceptron (57 vs. 65%). Both algorithms perform better than the baseline with recall on positive samples of 71% and accuracy of 54%.