Supervised clustering via principal component analysis in a retrieval application

  • Authors:
  • Esteban Garcia-Cuesta;Ines M. Galvan;Antonio J. de Castro

  • Affiliations:
  • Leganes (Madrid), Spain;Leganes (Madrid), Spain;Leganes (Madrid), Spain

  • Venue:
  • Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In regression problems where the number of predictors exceeds the number of observations and the correlation between the predictors is high, a dimensionality reduction or a variable selection approach is demanded. In this paper we deal with a real application where we want to retrieve the physical characteristics of a combustion process from the measurements obtained with a spectroscopic sensor. This application shows up a multicollinearity problem but furthermore it is considered an ill-posed problem. Guided by this application scenario, we propose a clustering approach to find out homogeneous subsets of data which are embedded in arbitrary oriented linear manifold. This model is developed under certain assumptions guided by a priori problem knowledge. The resulting division preserves both, the priori assumptions and the homogeneity in the models. Thereby we break the whole problem in n subproblems improving its individual prediction accuracy versus a global solution. We show the obtained improvements in a real application scenario related with estimating the temperature from spectroscopic data in a remote sensing framework.