Multiple correspondence analysis for "tall" data sets

  • Authors:
  • Angelos Markos;George Menexes;Theophilos Papadimitriou

  • Affiliations:
  • (Correspd. Dept. Tel.: +30 2310 891870/ Fax: +30 2310 891848/ E-mail: amarkos@uom.gr) Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece;Lab of Agronomy, School of Agriculture, Aristotle University of Thessaloniki, Greece;Department of Int. Economic Relations and Development, Democritus University of Thrace, Komotini, Greece

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Correspondence Analysis (CA) is a statistical method aiming at the graphical representation of the contingencies between the rows and the columns of a categorical data set. A critical step of the CA algorithm is the Singular Value Decomposition (SVD) analysis of a coded matrix. The size of this matrix affects drastically the analysis computational cost. As the size of the matrix increases, the method becomes computationally expensive or even impossible. In this paper we propose an alternative scheme that overpasses this limitation, without affecting the results accuracy. A set of Monte Carlo simulations and real data applications showed the efficiency of the proposed approach over the standard one, especially in the case of "tall" data sets.