Using semi-parametric clustering applied to electronic health record time series data
Proceedings of the 2011 workshop on Data mining for medicine and healthcare
Novel multi-sample scheme for inferring phylogenetic markers from whole genome tumor profiles
ISBRA'12 Proceedings of the 8th international conference on Bioinformatics Research and Applications
Computational Statistics & Data Analysis
Novel Multisample Scheme for Inferring Phylogenetic Markers from Whole Genome Tumor Profiles
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 3.84 |
Motivation: Analysis of array comparative genomic hybridization (aCGH) data for recurrent DNA copy number alterations from a cohort of patients can yield distinct sets of molecular signatures or profiles. This can be due to the presence of heterogeneous cancer subtypes within a supposedly homogeneous population. Results: We propose a novel statistical method for automatically detecting such subtypes or clusters. Our approach is model based: each cluster is defined in terms of a sparse profile, which contains the locations of unusually frequent alterations. The profile is represented as a hidden Markov model. Samples are assigned to clusters based on their similarity to the cluster's profile. We simultaneously infer the cluster assignments and the cluster profiles using an expectation maximization-like algorithm. We show, using a realistic simulation study, that our method is significantly more accurate than standard clustering techniques. We then apply our method to two clinical datasets. In particular, we examine previously reported aCGH data from a cohort of 106 follicular lymphoma patients, and discover clusters that are known to correspond to clinically relevant subgroups. In addition, we examine a cohort of 92 diffuse large B-cell lymphoma patients, and discover previously unreported clusters of biological interest which have inspired followup clinical research on an independent cohort. Availability: Software and synthetic datasets are available at http://www.cs.ubc.ca/~sshah/acgh as part of the CNA-HMMer package. Contact: sshah@bccrc.ca Supplementary information:Supplementary data are available at Bioinformatics online.