Using statistical text mining to supplement the development of an ontology

  • Authors:
  • Stephen Luther;Donald Berndt;Dezon Finch;Matthew Richardson;Edward Hickling;David Hickam

  • Affiliations:
  • Consortium for Healthcare Informatics Research (CHIR) and VA HSR&D/RR&D Center of Excellence: Maximizing Rehabilitation Outcomes, Tampa, FL, United States;Consortium for Healthcare Informatics Research (CHIR) and College of Business, University of South Florida, Tampa, FL, United States;Consortium for Healthcare Informatics Research (CHIR) and VA HSR&D/RR&D Center of Excellence: Maximizing Rehabilitation Outcomes, Tampa, FL, United States;Consortium for Healthcare Informatics Research (CHIR) and VA HSR&D/RR&D Center of Excellence: Maximizing Rehabilitation Outcomes, Tampa, FL, United States;Consortium for Healthcare Informatics Research (CHIR) and VA HSR&D/RR&D Center of Excellence: Maximizing Rehabilitation Outcomes, Tampa, FL, United States;Consortium for Healthcare Informatics Research (CHIR) and HSR&D Research Enhancement Program, Portland VA Medical Center, Portland, OR, United States and Department of Medicine, Oregon Health and ...

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical text mining was used to supplement efforts to develop a clinical vocabulary for post-traumatic stress disorder (PTSD) in the VA. A set of outpatient progress notes was collected for a cohort of 405 unique veterans with PTSD and a comparison group of 392 with other psychological conditions at one VA hospital. Two methods were employed: (1) ''multi-model term scoring'' used stepwise logistic regression to develop 21 separate models by varying three frequency weight and seven term weight options and (2) ''iterative term refinement'' which used a standard stop list followed by clinical review to eliminate non-clinical terms and terms not related to PTSD. Combined results of the two methods were reviewed by two clinicians resulting in 226 unique PTSD related terms. Results of the statistical text mining methods were compared with ongoing efforts to identify terms based on literature review, focus groups with clinicians treating PTSD and review of an existing vocabulary, lending support to the contributions of the STM analyses.