Finding associations among SNPS for prostate cancer using collaborative filtering

  • Authors:
  • Rohit Kugaonkar;Aryya Gangopadhyay;Yelena Yesha;Anupam Joshi;Yaacov Yesha;Michael Grasso;Mary Brady;Napthali Rishe

  • Affiliations:
  • University of Maryland Baltimore County, Baltimore, MD, USA;University of Maryland Baltimore County, Baltimore, MD, USA;University of Maryland Baltimore County, Baltimore, MD, USA;University of Maryland Baltimore County, Baltimore, MD, USA;University of Maryland Baltimore County, Baltimore, MD, USA;University Of Maryland School of Medicine, Baltimore, MD, USA;National Institute of Standards and Technology, Gaithersburg, MD, USA;Florida International University, Miami, FL, USA

  • Venue:
  • Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Prostate cancer is the second leading cause of cancer related deaths among men. Because of the slow growing nature of prostate cancer, sometimes surgical treatment is not required for less aggressive cancers. Recent debates over prostate-specific antigen (PSA) screening have drawn new attention to prostate cancer. Genome-based screening can potentially help in assessing the risk of developing prostate cancer. Due to the complicated nature of prostate cancer, studying the entire genome is essential to find genomic traits. Due to the high cost of studying all Single Nucleotide Polymorphisms (SNPs), it is essential to find tag SNPs which can represent other SNPs. Earlier methods to find tag SNPs using associations between SNPs either use SNP's location information or are based on data of very few SNP markers in each sample. Our study is based on 2300 samples with 550,000 SNPs each. We have not used SNP location information or any predefined standard cut-offs to find tag SNPs. Our approach is based on using collaborative filtering methods to find pairwise associations among SNPs and thus list top-N tag SNPs. We have found 25 tag SNPs which have highest similarities to other SNPs. In addition we found 16 more SNPs which have high correlation with the known high risk SNPs that are associated with prostate cancer. We used some of these newly found SNPs with 5 different classification algorithms and observed some improvement in prostate cancer prediction accuracy over using the original known high risk SNPs.