Data representation and algorithms for biomedical informatics applications

  • Authors:
  • Lucila Ohno-Machado;Griffin M. Weber

  • Affiliations:
  • Harvard University;Harvard University

  • Venue:
  • Data representation and algorithms for biomedical informatics applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Biomedical informatics is an emerging field at the intersection of computer science, biology, and medicine. The types of problems encountered in biomedical informatics are not new to computer science. These include representation of data, natural language processing, cluster analysis, and algorithm optimization. However, recently they have become far more important to the life sciences. This is a direct result of major technological advances in the biological sciences and medical research. New DNA sequencing tools can decode the entire genome of an organism, generating gigabytes of data that must be analyzed. Web portals provide physicians a means to communicate with each other, but the algorithms that push content to users must be fast and flexible enough to support the many roles a doctor might have. Digital libraries give scientists instant online access to thousands of journals, but advanced search techniques are required or else users can be swamped with information. This thesis explores five challenging computer science problems that all have applications in biomedical informatics: (1) performing efficient joins on set-valued data fields, (2) automatically assigning summary keywords to text-based documents, (3) using natural language processing to generate human-like dialog, (4) quickly finding the Hamming distance between sequences, and (5) improving stochastic hill-climbing algorithms for phylogenetic tree reconstruction by representing the data as an ordering of taxa.