A knowledge-based, concept-oriented view generation system for clinical data
Computers and Biomedical Research
Improvements to Platt's SMO Algorithm for SVM Classifier Design
Neural Computation
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning classifiers from only positive and unlabeled data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Journal of Biomedical Informatics
Editorial: Selected Papers from the 2011 Summit on Clinical Research Informatics
Journal of Biomedical Informatics
Hi-index | 0.00 |
Cohort identification is an important step in conducting clinical research studies. Use of ICD-9 codes to identify disease cohorts is a common approach that can yield satisfactory results in certain conditions; however, for many use-cases more accurate methods are required. In this study, we propose a bootstrapping method that supplements ICD-9 codes with lab results, medications, etc. to build classification models that can be used to identify cohorts more accurately. The proposed method does not require prior information about the true class of the patients. We used the method to identify Diabetes Mellitus (DM) and Hyperlipidemia (HL) patient cohorts from a database of 800 thousand patients. Evaluation results show that the method identified 11,000 patients who did not have DM related ICD-9 codes as positive for DM and 52,000 patients without HL codes as positive for HL. A review of 400 patient charts (200 patients for each condition) by two clinicians shows that in both the conditions studied, the labeling assigned by the proposed approach is more consistent with that of the clinicians compared to labeling through ICD-9 codes. The method is reasonably automated and, we believe, holds potential for inexpensive, more accurate cohort identification.