Boolean property encoding for local set pattern discovery: an application to gene expression data analysis

Authors:
Ruggero G. Pensa;Jean-François Boulicaut
Affiliations:
INSA Lyon, LIRIS CNRS UMR 5205, Villeurbanne cedex, France;INSA Lyon, LIRIS CNRS UMR 5205, Villeurbanne cedex, France
Venue:
LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection
Year:
2004

Citing 6
Cited 3

An O(n log n) algorithm for the maximum agreement subtree problem for binary trees

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
On distances between phylogenetic trees

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Frequent Closures as a Concise Representation for Binary Data Mining

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Efficient Progressive Sampling for Association Rules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Using transposition for pattern discovery from microarray data

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Constraint-based concept mining and its application to microarray data analysis

Intelligent Data Analysis

Closed patterns meet n-ary relations

ACM Transactions on Knowledge Discovery from Data (TKDD)
Mining bi-sets in numerical data

KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Temporal evolution and local patterns

LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the domain of gene expression data analysis, several researchers have recently emphasized the promising application of local pattern (e.g., association rules, closed sets) discovery techniques from boolean matrices that encode gene properties. Detecting local patterns by means of complete constraint-based mining techniques turns to be an important complementary approach or invaluable counterpart to heuristic global model mining. To take the most from local set pattern mining approaches, a needed step concerns gene expression property encoding (e.g., over-expression). The impact of this preprocessing phase on both the quantity and the quality of the extracted patterns is crucial. In this paper, we study the impact of discretization techniques by a sound comparison between the dendrograms, i.e., trees that are generated by a hierarchical clustering algorithm on raw numerical expression data and its various derived boolean matrices. Thanks to a new similarity measure, we can select the boolean property encoding technique which preserves similarity structures holding in the raw data. The discussion relies on several experimental results for three gene expression data sets. We believe our framework is an interesting direction of work for the many application domains in which (a) local set patterns have been proved useful, and (b) Boolean properties have to be derived from raw numerical data.