On the Reachability of Trustworthy Information from Integrated Exploratory Biological Queries

  • Authors:
  • Eithon Cadag;Peter Tarczy-Hornoch;Peter J. Myler

  • Affiliations:
  • University of Washington, Seattle 98195;University of Washington, Seattle 98195;University of Washington, Seattle 98195 and Seattle Biomedical Research Institute, Seattle 98109

  • Venue:
  • DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Levels of curation across biological databases are widely recognized as being highly variable, depending on provenance and type. In spite of ambiguous quality, searches against biological sources, such as those for sequence homology, remain a frontline strategy for biomedical scientists studying molecular data. In the following, we investigate the accessibility of well-curated data retrieved from explorative queries across multiple sources. We present the architecture and design of a lightweight data integration platform conducible to graph-theoretic analysis. Using data collected via this framework, we examine the reachability of evidence-supported annotations across triangulated sources in the face of uncertainty, using a simple random sampling model oriented around fault tolerance. We characterize the accessibility of high-quality data from uncertain queries and levels of redundancy across data sources and find that generally encountering non-experimentally verified annotations are nearly as likely as encountering experimentally verified annotations, with the exception of a group of proteins whose link structure is dominated by experimental evidence. Finally, we discuss the prospect of determining overall accessibility of relevant information based on metadata about a query and its results.