Personal genomes: a new frontier in database research

  • Authors:
  • Taro L. Saito

  • Affiliations:
  • Department of Computational Biology, The University of Tokyo, Japan

  • Venue:
  • DNIS'11 Proceedings of the 7th international conference on Databases in Networked Information Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the recent technological improvement of the next-generation sequencers, reading genome sequence of individual DNA becomes popular in biology and medical study. The amount of data produced by next generation sequencers is enormous. Today, more than 10,000 people's DNAs are sequenced in the world and tera-bytes of data are being produced in a daily basis. The types of genome information also vary according to the biological experiments used for preparing DNA samples. Biologists and medical scientists are now facing to manage these huge volumes of data with variety of types. Existing DBMS, whose major targets are business applications, is not suited to managing these biological data because storing such large data to DBMS is time-consuming, and also current database queries cannot accommodate various types of bioinformatics tools written in various programming languages. Processing bioinformatics workflows in parallel and distributed manner is also a challenging problem. In this paper, in hope of recruiting database researchers into this rapidly progressing biology and medical research area, we introduce several challenges in genome informatics from the viewpoint of using existing DBMS for processing next-generation sequencer data.