Incorporating Uncertainty Metrics into a General-Purpose Data Integration System
SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Provenance and scientific workflows: challenges and opportunities
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Integrating and Ranking Uncertain Scientific Data
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Supporting retrieval of diverse biomedical data using evidence-aware queries
Journal of Biomedical Informatics
Hi-index | 0.01 |
Levels of curation across biological databases are widely recognized as being highly variable, depending on provenance and type. In spite of ambiguous quality, searches against biological sources, such as those for sequence homology, remain a frontline strategy for biomedical scientists studying molecular data. In the following, we investigate the accessibility of well-curated data retrieved from explorative queries across multiple sources. We present the architecture and design of a lightweight data integration platform conducible to graph-theoretic analysis. Using data collected via this framework, we examine the reachability of evidence-supported annotations across triangulated sources in the face of uncertainty, using a simple random sampling model oriented around fault tolerance. We characterize the accessibility of high-quality data from uncertain queries and levels of redundancy across data sources and find that generally encountering non-experimentally verified annotations are nearly as likely as encountering experimentally verified annotations, with the exception of a group of proteins whose link structure is dominated by experimental evidence. Finally, we discuss the prospect of determining overall accessibility of relevant information based on metadata about a query and its results.