A machine learning approach to college drinking prediction and risk factor identification

  • Authors:
  • Jinbo Bi;Jiangwen Sun;Yu Wu;Howard Tennen;Stephen Armeli

  • Affiliations:
  • University of Connecticut;University of Connecticut;University of Connecticut;University of Connecticut Health Center;Fairleigh Dickinson University

  • Venue:
  • ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Alcohol misuse is one of the most serious public health problems facing adolescents and young adults in the United States. National statistics shows that nearly 90% of alcohol consumed by youth under 21 years of age involves binge drinking and 44% of college students engage in high-risk drinking activities. Conventional alcohol intervention programs, which aim at installing either an alcohol reduction norm or prohibition against underage drinking, have yielded little progress in controlling college binge drinking over the years. Existing alcohol studies are deductive where data are collected to investigate a psychological/behavioral hypothesis, and statistical analysis is applied to the data to confirm the hypothesis. Due to this confirmatory manner of analysis, the resulting statistical models are cohort-specific and typically fail to replicate on a different sample. This article presents two machine learning approaches for a secondary analysis of longitudinal data collected in college alcohol studies sponsored by the National Institute on Alcohol Abuse and Alcoholism. Our approach aims to discover knowledge, from multiwave cohort-sequential daily data, which may or may not align with the original hypothesis but quantifies predictive models with higher likelihood to generalize to new samples. We first propose a so-called temporally-correlated support vector machine to construct a classifier as a function of daily moods, stress, and drinking expectancies to distinguish days with nighttime binge drinking from days without for individual students. We then propose a combination of cluster analysis and feature selection, where cluster analysis is used to identify drinking patterns based on averaged daily drinking behavior and feature selection is used to identify risk factors associated with each pattern. We evaluate our methods on two cohorts of 530 total college students recruited during the Spring and Fall semesters, respectively. Cross validation on these two cohorts and further on 100 random partitions of the total students demonstrate that our methods improve the model generalizability in comparison with traditional multilevel logistic regression. The discovered risk factors and the interaction of these factors delineated in our models can set a potential basis and offer insights to a new design of more effective college alcohol interventions.