Boolean property encoding for local set pattern discovery: an application to gene expression data analysis

  • Authors:
  • Ruggero G. Pensa;Jean-François Boulicaut

  • Affiliations:
  • INSA Lyon, LIRIS CNRS UMR 5205, Villeurbanne cedex, France;INSA Lyon, LIRIS CNRS UMR 5205, Villeurbanne cedex, France

  • Venue:
  • LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the domain of gene expression data analysis, several researchers have recently emphasized the promising application of local pattern (e.g., association rules, closed sets) discovery techniques from boolean matrices that encode gene properties. Detecting local patterns by means of complete constraint-based mining techniques turns to be an important complementary approach or invaluable counterpart to heuristic global model mining. To take the most from local set pattern mining approaches, a needed step concerns gene expression property encoding (e.g., over-expression). The impact of this preprocessing phase on both the quantity and the quality of the extracted patterns is crucial. In this paper, we study the impact of discretization techniques by a sound comparison between the dendrograms, i.e., trees that are generated by a hierarchical clustering algorithm on raw numerical expression data and its various derived boolean matrices. Thanks to a new similarity measure, we can select the boolean property encoding technique which preserves similarity structures holding in the raw data. The discussion relies on several experimental results for three gene expression data sets. We believe our framework is an interesting direction of work for the many application domains in which (a) local set patterns have been proved useful, and (b) Boolean properties have to be derived from raw numerical data.