Mining gene expression data with pattern structures in formal concept analysis

  • Authors:
  • Mehdi Kaytoue;Sergei O. Kuznetsov;Amedeo Napoli;Sébastien Duplessis

  • Affiliations:
  • Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Campus Scientifique, B.P. 70239, 54500 Vanduvre-lès-Nancy, France;State University Higher School of Economics, Pokrovskiy Bd. 11, 109028 Moscow, Russia;Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Campus Scientifique, B.P. 70239, 54500 Vanduvre-lès-Nancy, France;UMR 1136, Institut National de la Recherche Agronomique (INRA), Nancy Université, Interactions Arbres/Micro-organismes, 54280 Champenoux, France

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 0.07

Visualization

Abstract

This paper addresses the important problem of efficiently mining numerical data with formal concept analysis (FCA). Classically, the only way to apply FCA is to binarize the data, thanks to a so-called scaling procedure. This may either involve loss of information, or produce large and dense binary data known as hard to process. In the context of gene expression data analysis, we propose and compare two FCA-based methods for mining numerical data and we show that they are equivalent. The first one relies on a particular scaling, encoding all possible intervals of attribute values, and uses standard FCA techniques. The second one relies on pattern structures without a priori transformation, and is shown to be more computationally efficient and to provide more readable results. Experiments with real-world gene expression data are discussed and give a practical basis for the comparison and evaluation of the methods.