Massive Parallelization of Serial Inference Algorithms for a Complex Generalized Linear Model

  • Authors:
  • Marc A. Suchard;Shawn E. Simpson;Ivan Zorych;Patrick Ryan;David Madigan

  • Affiliations:
  • University of California, Los Angeles;Columbia University;Columbia University;Johnson & Johnson;Columbia University

  • Venue:
  • ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special Issue on Monte Carlo Methods in Statistics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this article we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety.