Who shares? Who doesn't?: bibliometric factors associated with open archiving of biomedical datasets

  • Authors:
  • Heather A. Piwowar

  • Affiliations:
  • National Evolutionary Synthesis Center, Durham, NC

  • Venue:
  • Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many initiatives encourage investigators to share their raw research datasets in pursuit of increased research quality and efficiency. Despite these investments of time and money, we do not yet understand the impact of these initiatives. In this study, I use bibliometric methods to understand the prevalence and patterns with which investigators publicly share their raw gene expression microarray datasets after study publication. Automated methods were used to identify 11,603 published studies that created gene expression microarray data. At least 25% of these studies have datasets in one of the two predominant public databases for microarray data, increasing from 5% in 2001 to 35% in 2009. Fifteen factors that described authorship, funding, institution, publication, and domain environments were derived from 124 article attributes. Most factors associated with the prevalence of data sharing (p In second-order factor analysis, previously sharing gene expression microarray data was most positively associated with high data sharing rates, whereas publishing a study on cancer or human subjects was strongly associated with a negative probability of data sharing. I hope these methods and results will contribute to a deeper understanding of data sharing behavior and eventually more effective data sharing initiatives.