Is standard multivariate analysis sufficient in clinical and epidemiological studies?

  • Authors:
  • TíNia F. G. G. Cova;Jorge L. G. F. S. C. Pereira;Alberto A. C. C. Pais

  • Affiliations:
  • Chemistry Department, University of Coimbra, Rua Larga, 3004-535 Coimbra, Portugal;Chemistry Department, University of Coimbra, Rua Larga, 3004-535 Coimbra, Portugal;Chemistry Department, University of Coimbra, Rua Larga, 3004-535 Coimbra, Portugal

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clinical tests and epidemiological studies often produce large amounts of data, being multivariate in nature. The respective analysis is, in most cases, of importance comparable to the clinical and sampling tasks. Simple, easily interpretable techniques from chemometrics provide most of the ingredients to carry out this analysis. We have selected available data from different sources pertaining to cancer diagnosis and incidence: (1) cytological diagnosis of breast cancer, (2) classification of breast tissues through parameters obtained from impedance spectra and (3) distribution of new cancer cases in the United States. Hierarchical cluster analysis (HCA) is needed especially in cases where there is no a priori identification of classes, suggesting a structure of the data based on clusters. These clusters or the classes, are then further detailed and rationalized by principal component analysis (PCA). Partial least squares (PLS) and linear discriminant analysis (LDA) provide further insight into the systems. An additional step for understanding the data set is the removal of less characteristic data (NR) using a density-based approach, so as to make it more clearly defined. Results clearly reveal that breast cytology diagnosis relies on variables conveying mostly the same type of information, being thus interchangeable in nature. In the study on tissue characterization by electrical measurements, the distribution of the different types of tissues can be easily constructed. Finally, the distribution of new cancer cases possesses clear, easily unravelled, geographical patterns.