Data integration in multi-dimensional data sets: informational asymmetry in the valid correlation of subdivided samples

  • Authors:
  • Qing T. Zeng;Juan Pablo Pratt;Jane Pak;Eun-Young Kim;Dino Ravnic;Harold Huss;Steven J. Mentzer

  • Affiliations:
  • Decision Systems Group;Department of Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA;Decision Systems Group;Decision Systems Group;Department of Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA;Department of Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA;Department of Surgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA

  • Venue:
  • ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Background: Flow cytometry is the only currently available high throughput technology that can measure multiple physical and molecular characteristics of individual cells. It is common in flow cytometry to measure a relatively large number of characteristics or features by performing separate experiments on subdivided samples. Correlating data from multiple experiments using certain shared features (e.g. cell size) could provide useful information on the combination pattern of the not shared features. Such correlation, however, are not always reliable. Methods: We developed a method to assess the correlation reliability by estimating the percentage of cells that can be unambiguously correlated between two samples. This method was evaluated using 81 pairs of subdivided samples of microspheres (artificial cells) with known molecular characteristics. Results: Strong correlation (R=0.85) was found between the estimated and actual percentage of unambiguous correlation. Conclusion: The correlation reliability we developed can be used to support data integration of experiments on subdivided samples.