Study of Principal Components on Classification of Problematic Wine Fermentations

Authors:
Alejandra Urtubia U.;J. Ricardo Pérez-Correa
Affiliations:
Escuela de Ingeniería Industrial. Facultad de Ciencias Económicas y Administrativas, Universidad de Valparaíso, Valparaíso, Chile and Departamento de Ingeniería Quími ...;Departamento de Ingeniería Química y Bioprocesos, Escuela de Ingeniería, Pontificia Universidad Católica de Chile, Santiago 22, Chile
Venue:
ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Year:
2009

Citing 3
Cited 0

Clustering of Gene Expression Data by Mixture of PCA Models

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
The use of a supervised k-means algorithm on real-valued data with applications in health

IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
A genetic K-means clustering algorithm applied to gene expression data

AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining techniques have already shown useful to classify wine fermentations as problematic. Then, these techniques are a good option for winemakers who currently lack the tools to identify early signs of undesirable fermentation behavior and, therefore, are unable to take possible mitigating actions. In this study we assessed how much the performance of a clustering K-means fermentation classification procedure is affected by the number of principal components (PCs), when principal component analysis (PCA) is previously applied to reduce the dimensionality of the available data. It was observed that three PCs were enough to preserve the overall information of a dataset containing reliable measurements only. In this case, a 40% detection ability of problematic fermentations was achieved. In turn, using a more complete dataset, but containing unreliable measurements, the number of PCs yielded different classifications. Here, 33%f the problematic fermentations were detected.